# Week 3 Day 2: CSV Files

* CSV files
* JSON Files



## CSV Files
comma-separated values

The most common import and export format for spreadsheets and databases. CSV format was used for many years prior to attempts to describe the format in a standardized way.
The lack of a well-defined standard means that subtle differences often exist in the data produced and consumed by different applications. These differences can make it annoying to process CSV files from multiple sources. Still, while the delimiters and quoting characters vary, the overall format is similar enough that it is possible to write a single module which can efficiently manipulate such data, hiding the details of reading and writing the data from the programmer.

In [7]:
# python has a library that simlifies reading csv files
import csv

#### csv.reader(csvfile, dialect='excel', **fmtparams)

Return a reader object that will process lines from the given csvfile. A csvfile must be an iterable of strings, each in the reader’s defined csv format.

We wil make a csvReader to read the csv file this creates an object that we can use to read csv files. A csvReader is a stream based parser for CSV files.

link to documentation and info - [https://docs.python.org/3/library/csv.html]

In [8]:
#create a list to append the rows
myList = []

# we use with because it automatically closes the file when we're done with it
with open('data/movies.csv', 'r') as f:

    # file handle: not the contents of the file, but the way python "talks" to the file itself
    csvReader = csv.reader(f, delimiter = ',', quotechar='"')
    
    # skip the header
    next(csvReader)
    
    for row in csvReader:
        myList.append(row)
    
# out here (not indented), the file is closed


UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2614: character maps to <undefined>

In [10]:
# Data access

In [9]:
#get the first sublist
myList[0]

['0',
 'The Shawshank Redemption',
 '(1994)',
 '142 mins.',
 '9.3',
 '1,432,740',
 "[u'Crime', u'Drama']",
 'Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.',
 '$28.3M']

In [10]:
#get the actual text we want
myList[0][7]

'Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.'

In [11]:
#print out every genre that is mentioned in the csv file
for movie in myList:
    print(movie[6])

[u'Crime', u'Drama']
[u'Action', u'Crime', u'Drama']
[u'Action', u'Mystery', u'Sci-Fi', u'Thriller']
[u'Drama']
[u'Crime', u'Drama']
[u'Adventure', u'Fantasy']
[u'Adventure', u'Fantasy']
[u'Action', u'Sci-Fi']
[u'Drama', u'Romance']
[u'Crime', u'Drama']
[u'Action', u'Thriller']
[u'Adventure', u'Fantasy']
[u'Drama', u'Mystery', u'Thriller']
[u'Action', u'Drama']
[u'Action', u'Adventure']
[u'Action', u'Adventure', u'Sci-Fi']
[u'Action', u'Adventure', u'Fantasy', u'Sci-Fi']
[u'Western']
[u'Adventure', u'Drama', u'Fantasy']
[u'Adventure', u'Drama', u'Fantasy']
[u'Adventure', u'Drama', u'Fantasy']
[u'Adventure', u'Drama', u'Fantasy']
[u'Action', u'Adventure', u'Fantasy', u'Sci-Fi']
[u'Action', u'Drama', u'War']
[u'Crime', u'Drama', u'Thriller']
[u'Drama', u'Thriller']
[u'Biography', u'Drama', u'History']
[u'Mystery', u'Thriller']
[u'Adventure', u'Drama', u'War']
[u'Drama']
[u'Drama', u'Mystery', u'Thriller']
[u'Adventure', u'Fantasy']
[u'Drama', u'Romance']
[u'Action', u'Drama', u'Thriller'

In [14]:
#can we clean this data?
genreLyst_clean = []

for movie in myList:
    genre = movie[6]
    genreLyst = genre.split(',')
    
    for index, item in enumerate(genreLyst):
        
        genreLyst[index] = genreLyst[index].replace("[", "").replace("]","")
        
    genreLyst_clean.append(genreLyst)

print(genreLyst_clean)

[["u'Crime'", " u'Drama'"], ["u'Action'", " u'Crime'", " u'Drama'"], ["u'Action'", " u'Mystery'", " u'Sci-Fi'", " u'Thriller'"], ["u'Drama'"], ["u'Crime'", " u'Drama'"], ["u'Adventure'", " u'Fantasy'"], ["u'Adventure'", " u'Fantasy'"], ["u'Action'", " u'Sci-Fi'"], ["u'Drama'", " u'Romance'"], ["u'Crime'", " u'Drama'"], ["u'Action'", " u'Thriller'"], ["u'Adventure'", " u'Fantasy'"], ["u'Drama'", " u'Mystery'", " u'Thriller'"], ["u'Action'", " u'Drama'"], ["u'Action'", " u'Adventure'"], ["u'Action'", " u'Adventure'", " u'Sci-Fi'"], ["u'Action'", " u'Adventure'", " u'Fantasy'", " u'Sci-Fi'"], ["u'Western'"], ["u'Adventure'", " u'Drama'", " u'Fantasy'"], ["u'Adventure'", " u'Drama'", " u'Fantasy'"], ["u'Adventure'", " u'Drama'", " u'Fantasy'"], ["u'Adventure'", " u'Drama'", " u'Fantasy'"], ["u'Action'", " u'Adventure'", " u'Fantasy'", " u'Sci-Fi'"], ["u'Action'", " u'Drama'", " u'War'"], ["u'Crime'", " u'Drama'", " u'Thriller'"], ["u'Drama'", " u'Thriller'"], ["u'Biography'", " u'Drama'", 

In [15]:
#make an empty dicitonary for movie genres

genreCount = {}

#iterate through the text and add the movie genere and the amount of times that it is mentioned 

for movie in genreLyst_clean:
    for genre in movie:
        if genre in genreCount.keys():
            genreCount[genre] += 1
        else:
            genreCount[genre] = 1

genreCount

{"u'Crime'": 302,
 " u'Drama'": 1241,
 "u'Action'": 941,
 " u'Crime'": 461,
 " u'Mystery'": 468,
 " u'Sci-Fi'": 504,
 " u'Thriller'": 1226,
 "u'Drama'": 661,
 "u'Adventure'": 235,
 " u'Fantasy'": 452,
 " u'Romance'": 755,
 " u'Adventure'": 500,
 "u'Western'": 9,
 " u'War'": 153,
 "u'Biography'": 157,
 " u'History'": 129,
 "u'Mystery'": 44,
 " u'Biography'": 38,
 " u'Comedy'": 429,
 "u'Animation'": 218,
 " u'Family'": 347,
 " u'Musical'": 85,
 "u'Comedy'": 972,
 "u'Horror'": 198,
 " u'Horror'": 191,
 "u'Sci-Fi'": 19,
 " u'Sport'": 129,
 " u'Action'": 48,
 "u'Family'": 2,
 " u'Music'": 106,
 " u'Western'": 49,
 "u'Fantasy'": 10,
 "u'Romance'": 3,
 "u'Musical'": 1,
 " u'Film-Noir'": 10,
 "u'Thriller'": 8,
 "u'Documentary'": 33,
 "u'Film-Noir'": 3,
 " u'Talk-Show'": 1,
 " u'Animation'": 3,
 "u'History'": 1,
 '': 1,
 "u'Adult'": 1}

### Exercise 1: 

import tv_shows.csv and find the 5 most recently made tv shows

# JSON Files
[JavaScript Object Notation](https://www.json.org/)

A JSON file is a file that stores simple data structures and objects in JavaScript Object Notation (JSON) format, which is a standard data interchange format. It is primarily used for transmitting data between a web application and a server. JSON files are lightweight, text-based, human-readable, and can be edited using a text editor.

While many applications use JSON for data interchange, they may not actually save .json files on the hard drive since the data interchange occurs between Internet-connected computers. However, some applications do enable users to save .json files.

[JavaScript Object Notation](https://www.json.org/) (JSON) is a common data structure that one encounters working with [Application Programming Interfaces](https://en.wikipedia.org/wiki/API) (API)s. They organize through 'depth' by 'nesting' concepts within one another, much like a dictionary does.

for example:

In [None]:
dict = {
    'key1':
        'value1',
    'key2':
        'value2',
    'key3':
        'value3'
}

json notation allows for the engaging of deeper and more complex dictionaries that do not always have the same datatype. 


for exmaple:

In [None]:
complexDict1 = {
    '23fallClasses': 
        ['Api', 'UCD'],
    '23springClasses' : 
        ['theory', 'linguistic anthropology'],
    '22fallClasses': 
        ['WoK', 'Media Sociology'],
    '22springClasses' : 
        ['War and Peace', 'Data II']
}

'complex Dict 1' has strings and lists but jsons can even go further and nest to deeper levels besides the first one

In [None]:
complexDict2 = {
    '23fallClasses': { 
        'Api' : {
            'grade': 95.0,
            'teacher':'Brian Keegan'
        }, 
        'UCD':{
            'grade': 98.0,
            'teacher': 'Ricarose Roque'
        }},
    '23springClasses':{ 
        'theory': {
            'grade': 90.0,
            'teacher': 'Jed Burbaker'
        }, 
        'linguistic anthropology':{
            'grade': 95.1,
            'teacher': 'Kira Hall'
        }},
    '22fallClasses':{
        'WoK':{
            'grade': 92.0,
            'teacher': 'Bryan Semaan'
        }, 
        'Media Sociology':{
            'grade': 97.3,
            'teacher': 'Michael McDevitt'
        }},
    '22springClasses':{
        'War and Peace':{
            'grade': 89.0,
            'teacher': 'Jaroslav Tir'
        }, 
        'Data II':{
            'grade': 95.1,
            'teacher': 'Andrew Philips'
        }}
    
}

<!-- space -->



Now lets make some of our own and work with it....



In [17]:
#example of a json file
person = {"name":"John", "age":30, "car":None}

In [18]:
#what data type is it?
type(person)

dict

In [23]:
#grab data at a position
person['car']

In [22]:
#get data at another position
person['name']

'John'

In [24]:
# lets try something bigger

movie = {'id': 538,
 'url': 'https://www.tvmaze.com/shows/538/futurama',
 'name': 'Futurama',
 'type': 'Animation',
 'language': 'English',
 'genres': ['Comedy', 'Adventure', 'Science-Fiction'],
 'status': 'Ended',
 'runtime': 30,
 'averageRuntime': 30,
 'premiered': '1999-03-28',
 'ended': '2013-09-04',
 'officialSite': 'http://www.cc.com/shows/futurama',
 'schedule': {'time': '22:00', 'days': ['Wednesday']},
 'rating': {'average': 8.9},
 'weight': 98,
 'network': {'id': 23,
  'name': 'Comedy Central',
  'country': {'name': 'United States',
   'code': 'US',
   'timezone': 'America/New_York'}},
 'webChannel': None,
 'dvdCountry': None,
 'externals': {'tvrage': 3628, 'thetvdb': 73871, 'imdb': 'tt0149460'},
 'image': {'medium': 'https://static.tvmaze.com/uploads/images/medium_portrait/4/11403.jpg',
  'original': 'https://static.tvmaze.com/uploads/images/original_untouched/4/11403.jpg'},
 'summary': '<p><b>Futurama</b> follows pizza guy Philip J. Fry, who reawakens in 31st century New New York after a cryonics lab accident. Now part of the Planet Express delivery crew, Fry travels to the farthest reaches of the universe with his robot buddy Bender and cyclopsian love interest Leela, discovering freaky mutants, intergalactic conspiracies and other strange stuff.</p>',
 'updated': 1643062596,
 '_links': {'self': {'href': 'https://api.tvmaze.com/shows/538'},
  'previousepisode': {'href': 'https://api.tvmaze.com/episodes/49411'}}}

In [25]:
#what data type is this? 
type(movie)

dict

In [27]:

#grab data at a position

movie['network']

{'id': 23,
 'name': 'Comedy Central',
 'country': {'name': 'United States',
  'code': 'US',
  'timezone': 'America/New_York'}}

In [30]:
#get the timezone
movie['network']['country']['timezone']

'America/New_York'

In [32]:
#get an image .jpg name

movie['image']['original']

'https://static.tvmaze.com/uploads/images/original_untouched/4/11403.jpg'

#### reading a json file

json exposes an API familiar to users of the standard library marshal and pickle modules. Encoding basic Python object hierarchies, Compact encoding, Compact encoding, and Specializing JSON object decoding. 

json encoder and decoder docs - [https://docs.python.org/3/library/json.html]

In [33]:
# importing json library
import json

In [18]:
# open sample_users.json
# read json file




In [34]:
# look at the data
data

SyntaxError: invalid syntax (3896439314.py, line 2)

In [20]:
#get the type of data
type(data)

#### Your data get's imported as list, because in your JSON file the main structure is an Array (squared brackets), which is comparable to a list in Python.

In [21]:
# you can get how long it is
len(data)

In [22]:
# get the first email
data[0]['email']

<!--    -->

#### pprint

The pprint module provides a capability to “pretty-print” arbitrary Python data structures in a form which can be used as input to the interpreter

docs - [https://docs.python.org/3/library/pprint.html]

In [35]:
import pprint

In [23]:
#print original data (w/o pprint)



In [24]:
#now try pprint
pprint.pprint(data)

In [25]:
#what data type is this?
type(data)

### Excercise 1:
import nationalParks.json

Create a dictionary of the number of national parks per state

Hint: go to the example of counting genres