# Exercise: Titanic Database
    
The JSON file `titanic.json` contains information about 1000+ passengers that were aboard the Titanic when it sunk in 1912. It's a document database: not all entries have the same fields, since there is a lot of missing data.

**1) Using pymongo, load the JSON file into a MongoDB database.**

In [1]:
import json
from pymongo import MongoClient

client = MongoClient('localhost', 27017)
db = client.my_database
db.collection.drop()  
print('db is a', type(db))
print('db.collection is a', type(db.collection))

with open('titanic.json', 'r') as data_file:  
    data_json = json.load(data_file)
    db.collection.insert_many(data_json)

db is a <class 'pymongo.synchronous.database.Database'>
db.collection is a <class 'pymongo.synchronous.collection.Collection'>


**2) Find out exactly how many passengers (number of documents) there are in the database.**

In [2]:
len(list(db.collection.find(
)))

1309

**3) How old was passenger "Bourke, Mr. John"? (use the fields `name` and `age`)**

In [3]:
db.collection.find_one({
    'name': 'Bourke, Mr. John'
},
{
    'age': 1,
    '_id': 0
})

{'age': 40.0}

**4) The field `survived` tells us whether a passenger survived (value 1) or not (value 0). Find out how many survived and how many did not (note: many have missing data).**

In [4]:
len(list(db.collection.find(
    {'survived': 1.0}
)))

342

In [5]:
len(list(db.collection.find(
    {'survived': 0.0}
)))

549

**5) Who was the oldest survivor of the Titanic?**

In [None]:
list(db.collection.aggregate([
    {'$match': {'survived': 1}},  
    {'$group': {'_id': None, 'max_age': {'$max': '$age'}}}  
]))

[{'_id': None, 'max_age': 80.0}]

In [7]:
db.collection.find_one(
    {'age': 80, 'survived': 1}, 
    {'name': 1, 'age': 1, '_id': 0}
)

{'name': 'Barkworth, Mr. Algernon Henry Wilson', 'age': 80.0}

**6) Find the survival rate (survivors/total) for each ticket class. (use the field `class`. There were three: 1, 2, 3)**

In [11]:
total_firstclass = len(list(db.collection.find(
    {'class': 1.0}
)))

total_secondclass = len(list(db.collection.find(
    {'class': 2.0}
)))

total_thirdclass = len(list(db.collection.find(
    {'class': 3.0}
)))

survived_firstclass = len(list(db.collection.find(
    {'class': 1.0, 'survived': 1.0}
)))

survived_secondclass = len(list(db.collection.find(
    {'class': 2.0, 'survived': 1.0}
)))

survived_thirdclass = len(list(db.collection.find(
    {'class': 3.0, 'survived': 1.0}
)))

print("percentage survived 1st class:", survived_firstclass/total_firstclass)
print("percentage survived 2st class:", survived_secondclass/total_secondclass)
print("percentage survived 3st class:", survived_thirdclass/total_thirdclass)

percentage survived 1st class: 0.42024539877300615
percentage survived 2st class: 0.3161764705882353
percentage survived 3st class: 0.1671388101983003


**7) Which five passengers paid the five highest ticket prices? (use the field `fare`)**

In [15]:
list(db.collection.find({}, {'name': 1, 'fare': 1, '_id': 0}).sort("fare", -1))

[{'name': 'Cardeza, Mr. Thomas Drake Martinez', 'fare': 512.3292},
 {'name': 'Cardeza, Mrs. James Warburton Martinez (Charlotte Wardle Drake)',
  'fare': 512.3292},
 {'name': 'Ward, Miss. Anna', 'fare': 512.3292},
 {'name': 'Lesurer, Mr. Gustave J', 'fare': 512.3292},
 {'name': 'Fortune, Miss. Ethel Flora', 'fare': 263.0},
 {'name': 'Fortune, Miss. Alice Elizabeth', 'fare': 263.0},
 {'name': 'Fortune, Miss. Mabel Helen', 'fare': 263.0},
 {'name': 'Fortune, Mr. Charles Alexander', 'fare': 263.0},
 {'name': 'Fortune, Mrs. Mark (Mary McDougald)', 'fare': 263.0},
 {'name': 'Fortune, Mr. Mark', 'fare': 263.0},
 {'name': 'Ryerson, Master. John Borie', 'fare': 262.375},
 {'name': 'Ryerson, Miss. Susan Parker "Suzette"', 'fare': 262.375},
 {'name': 'Bowen, Miss. Grace Scott', 'fare': 262.375},
 {'name': 'Ryerson, Mr. Arthur Larned', 'fare': 262.375},
 {'name': 'Ryerson, Miss. Emily Borie', 'fare': 262.375},
 {'name': 'Ryerson, Mrs. Arthur Larned (Emily Maria Borie)', 'fare': 262.375},
 {'name'