# Week 3 Day 2: CSV Files

* CSV files
* JSON Files



## CSV Files
comma-separated values

The most common import and export format for spreadsheets and databases. CSV format was used for many years prior to attempts to describe the format in a standardized way.
The lack of a well-defined standard means that subtle differences often exist in the data produced and consumed by different applications. These differences can make it annoying to process CSV files from multiple sources. Still, while the delimiters and quoting characters vary, the overall format is similar enough that it is possible to write a single module which can efficiently manipulate such data, hiding the details of reading and writing the data from the programmer.

In [3]:
# python has a library that simlifies reading csv files
import csv

#### csv.reader(csvfile, dialect='excel', **fmtparams)

Return a reader object that will process lines from the given csvfile. A csvfile must be an iterable of strings, each in the reader’s defined csv format.

We wil make a csvReader to read the csv file this creates an object that we can use to read csv files. A csvReader is a stream based parser for CSV files.

link to documentation and info - [https://docs.python.org/3/library/csv.html]

In [4]:
#create a list to append the rows
myList = []

# we use with because it automatically closes the file when we're done with it
with open("data/movies.csv", 'r') as f:
    # file handle: not the contents of the file, but the way python "talks" to the file itself
    csvReader = csv.reader(f, delimiter= ',', quotechar='"')

    # skip the header
    next(csvReader) # start here,  pass by header
    for row in csvReader:
        myList.append(row)
# out here (not indented), the file is closed


In [5]:
# Data access
myList

[['0',
  'The Shawshank Redemption',
  '(1994)',
  '142 mins.',
  '9.3',
  '1,432,740',
  "[u'Crime', u'Drama']",
  'Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.',
  '$28.3M'],
 ['1',
  'The Dark Knight',
  '(2008)',
  '152 mins.',
  '9.0',
  '1,403,109',
  "[u'Action', u'Crime', u'Drama']",
  'When the menace known as the Joker wreaks havoc and chaos on the people of Gotham, the caped crusader must come to terms with one of the greatest psychological tests of his ability to fight injustice.',
  '$533M'],
 ['2',
  'Inception',
  '(2010)',
  '148 mins.',
  '8.8',
  '1,202,244',
  "[u'Action', u'Mystery', u'Sci-Fi', u'Thriller']",
  'A thief who steals corporate secrets through use of dream-sharing technology is given the inverse task of planting an idea into the mind of a CEO.',
  '$293M'],
 ['3',
  'Fight Club',
  '(1999)',
  '139 mins.',
  '8.9',
  '1,117,474',
  "[u'Drama']",
  'An insomniac office worker looki

In [6]:
#get the first sublist
myList[0]

['0',
 'The Shawshank Redemption',
 '(1994)',
 '142 mins.',
 '9.3',
 '1,432,740',
 "[u'Crime', u'Drama']",
 'Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.',
 '$28.3M']

In [17]:
# how many items
len(myList)

6518

In [18]:
#get the actual text we want
myList[0] # gets first think
# we want 7th thing
myList[0][7]

'Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.'

In [7]:
#print out every genre that is mentioned in the csv file
for movie in myList:
    print(movie[6]) # print item at position of genre

[u'Crime', u'Drama']
[u'Action', u'Crime', u'Drama']
[u'Action', u'Mystery', u'Sci-Fi', u'Thriller']
[u'Drama']
[u'Crime', u'Drama']
[u'Adventure', u'Fantasy']
[u'Adventure', u'Fantasy']
[u'Action', u'Sci-Fi']
[u'Drama', u'Romance']
[u'Crime', u'Drama']
[u'Action', u'Thriller']
[u'Adventure', u'Fantasy']
[u'Drama', u'Mystery', u'Thriller']
[u'Action', u'Drama']
[u'Action', u'Adventure']
[u'Action', u'Adventure', u'Sci-Fi']
[u'Action', u'Adventure', u'Fantasy', u'Sci-Fi']
[u'Western']
[u'Adventure', u'Drama', u'Fantasy']
[u'Adventure', u'Drama', u'Fantasy']
[u'Adventure', u'Drama', u'Fantasy']
[u'Adventure', u'Drama', u'Fantasy']
[u'Action', u'Adventure', u'Fantasy', u'Sci-Fi']
[u'Action', u'Drama', u'War']
[u'Crime', u'Drama', u'Thriller']
[u'Drama', u'Thriller']
[u'Biography', u'Drama', u'History']
[u'Mystery', u'Thriller']
[u'Adventure', u'Drama', u'War']
[u'Drama']
[u'Drama', u'Mystery', u'Thriller']
[u'Adventure', u'Fantasy']
[u'Drama', u'Romance']
[u'Action', u'Drama', u'Thriller'

In [14]:
# clean the data 

# 1- create empty list
genreLyst_clean = []

# enumerate() genre list 
# 2 - iterate thru each item in list 
for movie in myList:
    # 3 - assign the genre to a variable 
    genre = movie[6]
    genreLyst = genre.split(',') # make list in case their are multiple genres

    # 4 - use enumerate 
    
    for index, item in enumerate(genreLyst):
        # go thru each item in the list of genres for that movie and clean
        
        #remove brackets
        genreLyst[index] = genreLyst[index].replace("[","").replace("]","")
        #remove single quotes
        genreLyst[index] = genreLyst[index].replace("'","") # get rid of single quotes
        # remove empty space
        genreLyst[index] = genreLyst[index].replace(" ","") #get rid of empty space
        # get rid of the first character 
        genreLyst[index] = genreLyst[index][1:] # get rid of u [1:] = start at index 1
    # add all cleaned items to new list and go to next movie 
    genreLyst_clean.append(genreLyst)
genreLyst_clean   


[['Crime', 'Drama'],
 ['Action', 'Crime', 'Drama'],
 ['Action', 'Mystery', 'Sci-Fi', 'Thriller'],
 ['Drama'],
 ['Crime', 'Drama'],
 ['Adventure', 'Fantasy'],
 ['Adventure', 'Fantasy'],
 ['Action', 'Sci-Fi'],
 ['Drama', 'Romance'],
 ['Crime', 'Drama'],
 ['Action', 'Thriller'],
 ['Adventure', 'Fantasy'],
 ['Drama', 'Mystery', 'Thriller'],
 ['Action', 'Drama'],
 ['Action', 'Adventure'],
 ['Action', 'Adventure', 'Sci-Fi'],
 ['Action', 'Adventure', 'Fantasy', 'Sci-Fi'],
 ['Western'],
 ['Adventure', 'Drama', 'Fantasy'],
 ['Adventure', 'Drama', 'Fantasy'],
 ['Adventure', 'Drama', 'Fantasy'],
 ['Adventure', 'Drama', 'Fantasy'],
 ['Action', 'Adventure', 'Fantasy', 'Sci-Fi'],
 ['Action', 'Drama', 'War'],
 ['Crime', 'Drama', 'Thriller'],
 ['Drama', 'Thriller'],
 ['Biography', 'Drama', 'History'],
 ['Mystery', 'Thriller'],
 ['Adventure', 'Drama', 'War'],
 ['Drama'],
 ['Drama', 'Mystery', 'Thriller'],
 ['Adventure', 'Fantasy'],
 ['Drama', 'Romance'],
 ['Action', 'Drama', 'Thriller'],
 ['Action', 'A

In [43]:


# make an empty dicitonary for movie genres
genreCount = {}


#iterate through the text and add the movie genere (key) and the amount of times 
# that it is mentioned (value)

for movie in myList: # iterate thru each movie 
    for genre in genreLyst_clean # iterate thru each genre for that movie 
        if genre in genreCount.keys(): # if genre alredy in dictionary keys, increase value by 1 
            genreCount[genre] += 1 
        else:
            genreCount[genre] = 1
genreCount

{"[u'Crime'": 530,
 " u'Drama']": 544,
 "[u'Action'": 1355,
 " u'Crime'": 645,
 " u'Mystery'": 586,
 " u'Sci-Fi'": 342,
 " u'Thriller']": 1890,
 "[u'Drama']": 301,
 "[u'Adventure'": 377,
 " u'Fantasy']": 296,
 " u'Sci-Fi']": 462,
 "[u'Drama'": 921,
 " u'Romance']": 1011,
 " u'Adventure']": 22,
 " u'Adventure'": 677,
 " u'Fantasy'": 419,
 "[u'Western']": 18,
 " u'Drama'": 1643,
 " u'War']": 259,
 "[u'Biography'": 256,
 " u'History']": 74,
 "[u'Mystery'": 55,
 " u'Mystery']": 179,
 " u'Biography'": 56,
 " u'History'": 153,
 "[u'Action']": 5,
 " u'Comedy'": 601,
 "[u'Animation'": 386,
 " u'Romance'": 307,
 " u'Family']": 144,
 " u'Family'": 460,
 " u'Musical']": 72,
 "[u'Comedy'": 1434,
 " u'Crime']": 124,
 "[u'Comedy']": 319,
 "[u'Horror']": 86,
 " u'Horror']": 99,
 "[u'Sci-Fi'": 24,
 "[u'Horror'": 265,
 " u'Sport']": 198,
 " u'Action'": 107,
 " u'Comedy']": 102,
 " u'Horror'": 264,
 "[u'Adventure']": 2,
 " u'Thriller'": 78,
 " u'Musical'": 87,
 "[u'Family'": 10,
 " u'Music']": 125,
 " u

### Exercise 1: 

import tv_shows.csv and find the 5 most recently made tv shows

In [23]:
import re
myList = []
with open("data/tv_shows.csv", 'r') as myFile:
    csvReader = csv.reader(myFile)
    # add to list
    # get list of tv shows and values
    # order from 0-5 index

    
    next(csvReader)

    for row in csvReader:
        myList.append(row)
myList

[['0',
  'Game of Thrones',
  '(2011 TV Series)',
  '55 mins.',
  '9.5',
  '748,557',
  "[u'Adventure', u'Drama', u'Fantasy']",
  'Several noble families fight for control of the mythical land of Westeros.'],
 ['1',
  'Breaking Bad',
  '(2008 TV Series)',
  '45 mins.',
  '9.5',
  '662,459',
  "[u'Crime', u'Drama', u'Thriller']",
  'A chemistry teacher diagnosed with a terminal lung cancer, teams up with his former student, Jesse Pinkman, to cook and sell crystal meth.'],
 ['2',
  'The Walking Dead',
  '(2010 TV Series)',
  '44 mins.',
  '8.7',
  '500,301',
  "[u'Drama', u'Horror']",
  "Sheriff's Deputy Rick Grimes leads a group of survivors in a world overrun by zombies."],
 ['3',
  'The Big Bang Theory',
  '(2007 TV Series)',
  '22 mins.',
  '8.5',
  '438,226',
  "[u'Comedy']",
  'A woman who moves into an apartment across the hall from two brilliant but socially awkward physicists shows them how little they know about life outside of the laboratory.'],
 ['4',
  'Dexter',
  '(2006 TV 

In [49]:
# 1 - convert years to int values and clean, and add title, year pairs to a list

tvshow_years = []

for row in myList:
    title = row[1]
    raw_year = row[2]
    # look for only the numbers in the raw_year: look in string for 1st occurence of 4 digits in a row
    # if 4 digits in a row is found, return the object to match
    # r' = don't treat backslashes as escape character
    # \d = searches for 0-9
    # {4} = quantifier that says 4 times of the regex character before it 
    match = re.search(r'\d{4}', raw_year)
    if match: 
        year = int(match.group()) # convert year str to int
        # .group() returns the actual text of the object it is applied to
        tvshow_years.append((title, year))
print(tvshow_years)

[('Game of Thrones', 2011), ('Breaking Bad', 2008), ('The Walking Dead', 2010), ('The Big Bang Theory', 2007), ('Dexter', 2006), ('How I Met Your Mother', 2005), ('Friends', 1994), ('Sherlock', 2010), ('Lost', 2004), ('Prison Break', 2005), ('House M.D.', 2004), ('Supernatural', 2005), ('True Detective', 2014), ('Arrow', 2012), ('The Simpsons', 1989), ('House of Cards', 2013), ('Family Guy', 1999), ('Modern Family', 2009), ('South Park', 1997), ('True Blood', 2008), ('The Vampire Diaries', 2009), ('Homeland', 2011), ('Suits', 2011), ('Heroes', 2006), ('Arrested Development', 2003), ('Two and a Half Men', 2003), ('Scrubs', 2001), ('The Office', 2005), ('Firefly', 2002), ('Fringe', 2008), ('American Horror Story', 2011), ('The Sopranos', 1999), ('The Wire', 2002), ('Spartacus: War of the Damned', 2010), ('Sons of Anarchy', 2008), ('Once Upon a Time', 2011), ('Seinfeld', 1989), ('Grey&#x27;s Anatomy', 2005), ('Californication', 2007), ('24', 2001), ('Orange Is the New Black', 2013), ('Doc

In [58]:
# 2 - find top 5 most recent shows 

# newList = sorted(myList, key=None, reverse=True) sorts the list from small to big
# key = tells sorted() whatg value in the list to sort by
    # if key none, compares whole item
    # if key = lambda x: x[1] means 'for item x, return x at index 1 
# True = if true, put biggest first 
sorted_shows = sorted(tvshow_years, key = lambda x: x[1], reverse = True)
#print(sorted_shows)

# 3 - add to top 5 list
top5 = sorted_shows[0:5]
print("The top 5 most recent shows are: ")
print(top5)


The top 5 most recent shows are: 
[('Better Call Saul', 2015), ('Daredevil', 2015), ('Agent Carter', 2015), ('Unbreakable Kimmy Schmidt', 2015), ('The Last Man on Earth', 2015)]


# JSON Files
[JavaScript Object Notation](https://www.json.org/)

A JSON file is a file that stores simple data structures and objects in JavaScript Object Notation (JSON) format, which is a standard data interchange format. It is primarily used for transmitting data between a web application and a server. JSON files are lightweight, text-based, human-readable, and can be edited using a text editor.

While many applications use JSON for data interchange, they may not actually save .json files on the hard drive since the data interchange occurs between Internet-connected computers. However, some applications do enable users to save .json files.

[JavaScript Object Notation](https://www.json.org/) (JSON) is a common data structure that one encounters working with [Application Programming Interfaces](https://en.wikipedia.org/wiki/API) (API)s. They organize through 'depth' by 'nesting' concepts within one another, much like a dictionary does.

for example:

In [None]:
# JSON - like a dictionaries
# used in APIs
# out put for APIs is a json file or structure 

In [64]:
import pprint
dict = {
    'key1':
        'value1',
    'key2':
        'value2',
    'key3':
        'value3'
}
pprint.pprint(dict)

{'key1': 'value1', 'key2': 'value2', 'key3': 'value3'}


json notation allows for the engaging of deeper and more complex dictionaries that do not always have the same datatype. 


for exmaple:

In [65]:
complexDict1 = {
    '23fallClasses': 
        ['Api', 'UCD'],
    '23springClasses' : 
        ['theory', 'linguistic anthropology'],
    '22fallClasses': 
        ['WoK', 'Media Sociology'],
    '22springClasses' : 
        ['War and Peace', 'Data II']
}
pprint.pprint(complexDict1)

{'22fallClasses': ['WoK', 'Media Sociology'],
 '22springClasses': ['War and Peace', 'Data II'],
 '23fallClasses': ['Api', 'UCD'],
 '23springClasses': ['theory', 'linguistic anthropology']}


'complex Dict 1' has strings and lists but jsons can even go further and nest to deeper levels besides the first one

In [67]:
complexDict2 = {
    '23fallClasses': { 
        'Api' : {
            'grade': 95.0,
            'teacher':'Brian Keegan'
        }, 
        'UCD':{
            'grade': 98.0,
            'teacher': 'Ricarose Roque'
        }},
    '23springClasses':{ 
        'theory': {
            'grade': 90.0,
            'teacher': 'Jed Burbaker'
        }, 
        'linguistic anthropology':{
            'grade': 95.1,
            'teacher': 'Kira Hall'
        }},
    '22fallClasses':{
        'WoK':{
            'grade': 92.0,
            'teacher': 'Bryan Semaan'
        }, 
        'Media Sociology':{
            'grade': 97.3,
            'teacher': 'Michael McDevitt'
        }},
    '22springClasses':{
        'War and Peace':{
            'grade': 89.0,
            'teacher': 'Jaroslav Tir'
        }, 
        'Data II':{
            'grade': 95.1,
            'teacher': 'Andrew Philips'
        }}
    
}
pprint.pprint(complexDict2)

{'22fallClasses': {'Media Sociology': {'grade': 97.3,
                                       'teacher': 'Michael McDevitt'},
                   'WoK': {'grade': 92.0, 'teacher': 'Bryan Semaan'}},
 '22springClasses': {'Data II': {'grade': 95.1, 'teacher': 'Andrew Philips'},
                     'War and Peace': {'grade': 89.0,
                                       'teacher': 'Jaroslav Tir'}},
 '23fallClasses': {'Api': {'grade': 95.0, 'teacher': 'Brian Keegan'},
                   'UCD': {'grade': 98.0, 'teacher': 'Ricarose Roque'}},
 '23springClasses': {'linguistic anthropology': {'grade': 95.1,
                                                 'teacher': 'Kira Hall'},
                     'theory': {'grade': 90.0, 'teacher': 'Jed Burbaker'}}}


<!-- space -->



Now lets make some of our own and work with it....



In [68]:
#example of a json file
person = {"name":"John", "age":30, "car":None}
pprint.pprint(person)

{'age': 30, 'car': None, 'name': 'John'}


In [47]:
#what data type is it?
type(person)

dict

In [48]:
#grab data at a position
person["name"]

'John'

In [49]:
#get data at another position
person["car"]

In [50]:
# lets try something bigger

movie = {'id': 538,
 'url': 'https://www.tvmaze.com/shows/538/futurama',
 'name': 'Futurama',
 'type': 'Animation',
 'language': 'English',
 'genres': ['Comedy', 'Adventure', 'Science-Fiction'],
 'status': 'Ended',
 'runtime': 30,
 'averageRuntime': 30,
 'premiered': '1999-03-28',
 'ended': '2013-09-04',
 'officialSite': 'http://www.cc.com/shows/futurama',
 'schedule': {'time': '22:00', 'days': ['Wednesday']},
 'rating': {'average': 8.9},
 'weight': 98,
 'network': {'id': 23,
  'name': 'Comedy Central',
  'country': {'name': 'United States',
   'code': 'US',
   'timezone': 'America/New_York'}},
 'webChannel': None,
 'dvdCountry': None,
 'externals': {'tvrage': 3628, 'thetvdb': 73871, 'imdb': 'tt0149460'},
 'image': {'medium': 'https://static.tvmaze.com/uploads/images/medium_portrait/4/11403.jpg',
  'original': 'https://static.tvmaze.com/uploads/images/original_untouched/4/11403.jpg'},
 'summary': '<p><b>Futurama</b> follows pizza guy Philip J. Fry, who reawakens in 31st century New New York after a cryonics lab accident. Now part of the Planet Express delivery crew, Fry travels to the farthest reaches of the universe with his robot buddy Bender and cyclopsian love interest Leela, discovering freaky mutants, intergalactic conspiracies and other strange stuff.</p>',
 'updated': 1643062596,
 '_links': {'self': {'href': 'https://api.tvmaze.com/shows/538'},
  'previousepisode': {'href': 'https://api.tvmaze.com/episodes/49411'}}}

'Beast Machines: Transformers'


In [51]:
#what data type is this? 
type(movie)

dict

In [54]:
#grab data at a position
movie["network"]

{'id': 23,
 'name': 'Comedy Central',
 'country': {'name': 'United States',
  'code': 'US',
  'timezone': 'America/New_York'}}

In [56]:
#get the timezone
movie["network"]["country"]["timezone"]

'America/New_York'

In [57]:
#get an image .jpg name
movie["image"]["original"]

#image': {'medium': 'https://static.tvmaze.com/uploads/images/medium_portrait/4/11403.jpg',
 # 'original': 'https://static.tvmaze.com/uploads/images/original_untouched/4/11403.jpg'},

'https://static.tvmaze.com/uploads/images/original_untouched/4/11403.jpg'

#### reading a json file

json exposes an API familiar to users of the standard library marshal and pickle modules. Encoding basic Python object hierarchies, Compact encoding, Compact encoding, and Specializing JSON object decoding. 

json encoder and decoder docs - [https://docs.python.org/3/library/json.html]

In [73]:
# importing json library
import json

In [74]:
# open sample_users.json
# read json file

f = open("data/sample_users.json", 'r') # assign json file to a variable

data = json.loads(f.read()) 

In [75]:
# look at the data
pprint.pprint(data)

[{'created_at': '2016-11-28T14:10:11.338Z',
  'email': 'test@test.com',
  'email_verified': True,
  'family_name': 'Test',
  'given_name': 'Hello',
  'last_ip': '94.121.163.63',
  'last_login': '2016-12-02T01:17:29.310Z',
  'logins_count': 15,
  'name': 'test@test.com',
  'nickname': 'test',
  'updated_at': '2016-12-02T01:17:29.310Z',
  'user_id': '583c3ac3f38e84297c002546'},
 {'created_at': '2016-11-28T16:00:04.209Z',
  'email': 'test1@test.com',
  'email_verified': True,
  'family_name': 'Test1',
  'given_name': 'Hello1',
  'last_ip': '94.121.168.53',
  'last_login': '2016-11-28T16:00:47.203Z',
  'logins_count': 1,
  'name': 'test1@test.com',
  'nickname': 'test1',
  'updated_at': '2016-11-28T16:00:47.203Z',
  'user_id': '583c5484cb79a5fe593425a9'},
 {'created_at': '2016-11-28T16:12:23.777Z',
  'email': 'aaa@aaa.com',
  'email_verified': True,
  'family_name': 'Dough',
  'given_name': 'John',
  'last_ip': '94.121.168.53',
  'last_login': '2016-11-28T16:12:52.353Z',
  'logins_count': 

In [64]:
#get the type of data
type(data)

list

#### Your data get's imported as list, because in your JSON file the main structure is an Array (squared brackets), which is comparable to a list in Python.

In [65]:
# you can get how long it is
len(data) # 5 items/dictionarie

5

In [67]:
# get the first email 
data[0]["email"] 

'test@test.com'

<!--    -->

#### pprint

The pprint module provides a capability to “pretty-print” arbitrary Python data structures in a form which can be used as input to the interpreter

docs - [https://docs.python.org/3/library/pprint.html]

In [69]:
import pprint # allows u to see data better 

In [70]:
#print original data (w/o pprint)
data

[{'user_id': '583c3ac3f38e84297c002546',
  'email': 'test@test.com',
  'name': 'test@test.com',
  'given_name': 'Hello',
  'family_name': 'Test',
  'nickname': 'test',
  'last_ip': '94.121.163.63',
  'logins_count': 15,
  'created_at': '2016-11-28T14:10:11.338Z',
  'updated_at': '2016-12-02T01:17:29.310Z',
  'last_login': '2016-12-02T01:17:29.310Z',
  'email_verified': True},
 {'user_id': '583c5484cb79a5fe593425a9',
  'email': 'test1@test.com',
  'name': 'test1@test.com',
  'given_name': 'Hello1',
  'family_name': 'Test1',
  'nickname': 'test1',
  'last_ip': '94.121.168.53',
  'logins_count': 1,
  'created_at': '2016-11-28T16:00:04.209Z',
  'updated_at': '2016-11-28T16:00:47.203Z',
  'last_login': '2016-11-28T16:00:47.203Z',
  'email_verified': True},
 {'user_id': '583c57672c7686377d2f66c9',
  'email': 'aaa@aaa.com',
  'name': 'aaa@aaa.com',
  'given_name': 'John',
  'family_name': 'Dough',
  'nickname': 'aaa',
  'last_ip': '94.121.168.53',
  'logins_count': 2,
  'created_at': '2016-11

In [71]:
#now try pprint
pprint.pprint(data)

[{'created_at': '2016-11-28T14:10:11.338Z',
  'email': 'test@test.com',
  'email_verified': True,
  'family_name': 'Test',
  'given_name': 'Hello',
  'last_ip': '94.121.163.63',
  'last_login': '2016-12-02T01:17:29.310Z',
  'logins_count': 15,
  'name': 'test@test.com',
  'nickname': 'test',
  'updated_at': '2016-12-02T01:17:29.310Z',
  'user_id': '583c3ac3f38e84297c002546'},
 {'created_at': '2016-11-28T16:00:04.209Z',
  'email': 'test1@test.com',
  'email_verified': True,
  'family_name': 'Test1',
  'given_name': 'Hello1',
  'last_ip': '94.121.168.53',
  'last_login': '2016-11-28T16:00:47.203Z',
  'logins_count': 1,
  'name': 'test1@test.com',
  'nickname': 'test1',
  'updated_at': '2016-11-28T16:00:47.203Z',
  'user_id': '583c5484cb79a5fe593425a9'},
 {'created_at': '2016-11-28T16:12:23.777Z',
  'email': 'aaa@aaa.com',
  'email_verified': True,
  'family_name': 'Dough',
  'given_name': 'John',
  'last_ip': '94.121.168.53',
  'last_login': '2016-11-28T16:12:52.353Z',
  'logins_count': 

In [72]:
#what data type is this?
type(data)

list

### Excercise 1:
import nationalParks.json

Create a dictionary of the number of national parks per state

Hint: go to the example of counting genres

In [79]:
# 1 - read in file
f = open("data/nationalParks.json", 'r')
np_data = json.loads(f.read()) 
pprint.pprint(np_data)

[{'area': {'acres': '49,057.36', 'square_km': '198.5'},
  'coordinates': {'latitude': 44.35, 'longitude': -68.21},
  'date_established_readable': 'February 26, 1919',
  'date_established_unix': -1604599200,
  'description': 'Covering most of Mount Desert Island and other coastal '
                 'islands, Acadia features the tallest mountain on the '
                 'Atlantic coast of the United States, granite peaks, ocean '
                 'shoreline, woodlands, and lakes. There are freshwater, '
                 'estuary, forest, and intertidal habitats.',
  'id': 'park_acadia',
  'image': {'attribution': 'PixelBay/@Skeeze',
            'attribution_url': 'https://pixabay.com/en/users/skeeze-272447/',
            'url': 'acadia.jpg'},
  'nps_link': 'https://www.nps.gov/acad/index.htm',
  'states': [{'id': 'state_maine', 'title': 'Maine'}],
  'title': 'Acadia',
  'visitors': '3,303,393',
  'world_heritage_site': False},
 {'area': {'acres': '8,256.67', 'square_km': '33.4'},
  'coo

In [83]:
# 2 - Create a dictionary of the number of national parks per state
    # - json np file is list of national parks
    # - each item is a park
    # - loop thru each item at index 'state', add state to dictonary and increase frequency by 1 

# 3 - create empty dictionary to hold state counts
npStateCount = {}

# 4 - iterate thru each national park to extract the state 
for np in np_data: # each of these is a nat park
    # 4a - iterate thru each state for parks state list
    for state in np["states"]:  
    
        if state["title"] in npStateCount.keys():
            npStateCount[state["title"]] += 1
        else:
            npStateCount[state["title"]] = 1

npStateCount

{'Maine': 1,
 'American Samoa': 1,
 'Utah': 5,
 'South Dakota': 2,
 'Texas': 2,
 'Florida': 3,
 'Colorado': 4,
 'New Mexico': 1,
 'California': 9,
 'South Carolina': 1,
 'Oregon': 1,
 'Ohio': 1,
 'Nevada': 2,
 'Alaska': 8,
 'Montana': 2,
 'Arizona': 3,
 'Wyoming': 2,
 'Tennessee': 1,
 'North Carolina': 1,
 'Hawaii': 2,
 'Arkansas': 1,
 'Michigan': 1,
 'Kentucky': 1,
 'Washington': 3,
 'Virginia': 1,
 'North Dakota': 1,
 'US Virgin Islands': 1,
 'Minnesota': 1,
 'Idaho': 1}

In [81]:
# this is a list 
np_data[0]["states"][0]["title"]

'Maine'