Import the data provided in the outputs.json file from your Terminal. Name the database Baseball_db and the collection Players.

Within this markdown cell, copy the line of text you used to import the data from your Terminal. This way, future analysts will be able to repeat your process.

e.g.: Import the dataset with mongoimport --type jsd Baseball_dbfoodPlayersents --drop --jsonAroutputents.json

In [1]:
# Import dependencies
from pymongo import MongoClient
from pprint import pprint
import pandas as pd
from pathlib import Path

In [2]:
# Create an instance of MongoClient
mongo = MongoClient(port=27017)

In [3]:
# confirm that our new database was created
print(mongo.list_database_names())

['Baseball_db', 'admin', 'autosaurus', 'classDB', 'config', 'local', 'met', 'petsitly_marketing', 'travel_db', 'uk_food']


In [4]:
# assign the uk_food database to a variable name
db = mongo['Baseball_db']

In [5]:
# review the collections in our new database
db.list_collection_names()

['Master', 'GeoData', 'HallOfFame', 'All-Star']

In [6]:
# review a document in the establishments collection
document = db.Master.find_one()
pprint(document)

{'_id': ObjectId('669ee7f68a102274f0b81b78'),
 'bats': 'R',
 'bbrefID': 'aardsda01',
 'birthCity': 'Denver',
 'birthCountry': 'USA',
 'birthState': 'CO',
 'debut': '4/6/2004',
 'finalGame': '8/23/2015',
 'height': 75,
 'nameFirst': 'David',
 'nameGiven': 'David Allan',
 'nameLast': 'Aardsma',
 'playerID': 'aardsda01',
 'retroID': 'aardd001',
 'throws': 'R',
 'weight': 220}


In [14]:
master_collection = db['Master']

In [16]:
# Example: Remove 'birthCity', 'retroID', 'bbrefID' fields from all documents
master_collection.update_many({}, {'$unset': {'birthCity': '', 'retroID': '', 'bbrefID': ''}})

UpdateResult({'n': 18846, 'nModified': 18845, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)

In [21]:
for doc in master_collection.find():
    print(doc)

{'_id': ObjectId('669ee7f68a102274f0b81b78'), 'playerID': 'aardsda01', 'birthCountry': 'USA', 'birthState': 'CO', 'nameFirst': 'David', 'nameLast': 'Aardsma', 'nameGiven': 'David Allan', 'weight': 220, 'height': 75, 'bats': 'R', 'throws': 'R', 'debut': '4/6/2004', 'finalGame': '8/23/2015'}
{'_id': ObjectId('669ee7f68a102274f0b81b79'), 'playerID': 'aaronha01', 'birthCountry': 'USA', 'birthState': 'AL', 'nameFirst': 'Hank', 'nameLast': 'Aaron', 'nameGiven': 'Henry Louis', 'weight': 180, 'height': 72, 'bats': 'R', 'throws': 'R', 'debut': '4/13/1954', 'finalGame': '10/3/1976'}
{'_id': ObjectId('669ee7f68a102274f0b81b7a'), 'playerID': 'aaronto01', 'birthCountry': 'USA', 'birthState': 'AL', 'nameFirst': 'Tommie', 'nameLast': 'Aaron', 'nameGiven': 'Tommie Lee', 'weight': 190, 'height': 75, 'bats': 'R', 'throws': 'R', 'debut': '4/10/1962', 'finalGame': '9/26/1971'}
{'_id': ObjectId('669ee7f68a102274f0b81b7b'), 'playerID': 'aasedo01', 'birthCountry': 'USA', 'birthState': 'CA', 'nameFirst': 'Don

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



In [30]:
update_pipeline = [
    {'$set': {'fullName': {'$concat': ['$nameFirst', ' ', '$nameLast']}}}
]

master_collection.update_many({}, update_pipeline)

# Verify the update by printing documents
for doc in master_collection.find():
    print(doc)

{'_id': ObjectId('669ee7f68a102274f0b81b78'), 'playerID': 'aardsda01', 'birthCountry': 'USA', 'birthState': 'CO', 'nameFirst': 'David', 'nameLast': 'Aardsma', 'nameGiven': 'David Allan', 'weight': 220, 'height': 75, 'bats': 'R', 'throws': 'R', 'debut': '4/6/2004', 'finalGame': '8/23/2015', 'fullName': 'David Aardsma'}
{'_id': ObjectId('669ee7f68a102274f0b81b79'), 'playerID': 'aaronha01', 'birthCountry': 'USA', 'birthState': 'AL', 'nameFirst': 'Hank', 'nameLast': 'Aaron', 'nameGiven': 'Henry Louis', 'weight': 180, 'height': 72, 'bats': 'R', 'throws': 'R', 'debut': '4/13/1954', 'finalGame': '10/3/1976', 'fullName': 'Hank Aaron'}
{'_id': ObjectId('669ee7f68a102274f0b81b7a'), 'playerID': 'aaronto01', 'birthCountry': 'USA', 'birthState': 'AL', 'nameFirst': 'Tommie', 'nameLast': 'Aaron', 'nameGiven': 'Tommie Lee', 'weight': 190, 'height': 75, 'bats': 'R', 'throws': 'R', 'debut': '4/10/1962', 'finalGame': '9/26/1971', 'fullName': 'Tommie Aaron'}
{'_id': ObjectId('669ee7f68a102274f0b81b7b'), '

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



In [40]:
# Update documents: Unset 'nameLast' and 'nameFirst' fields
update_operation = {'$unset': {'nameLast': '', 'nameFirst': ''}}

master_collection.update_many({}, update_operation)


UpdateResult({'n': 18846, 'nModified': 0, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)