# Collection Techniques Practical Exercise

In this practical exercise we will explore ways to get data into your data science environment. We will use native Python methods and modules to import some data and have a look at it. Then we will look for ways to get information from external sources.

### Reading Data: CSVs

One way to get data into our analytic environment is to simply read .csv files provided by sensors, exported from other sources like *Kibana* or *Gabriel Nimbus*. Let's look at how we may do this. As always, Python has *batteries included*. The `csv` built-in module is great for basic data-munging needs.

In [2]:
import csv
from pprint import pprint

Let's open a csv file and use csv's `reader` function to read in the data.

In [3]:
csv_file = open("data/health_inspection_chi_sample.csv")

reader = csv.reader(csv_file)

`csv.reader` returns an iterator that you may move through deliberately or iterate through with a for loop. To get to the first object in the iterator use the next() method on the iterator:

In [11]:
headers = next(reader)

`csv.reader` conveniently splits the .csv text into lists. Each row becomes a list, with the first object in reader (the first row) in this case being the table column names. Below is a nicely printed list of the column names for the example data.

In [12]:
pprint(headers)

['address',
 'aka_name',
 'city',
 'dba_name',
 'facility_type',
 'inspection_date',
 'inspection_id',
 'inspection_type',
 'latitude',
 'license_',
 'location',
 'longitude',
 'results',
 'risk',
 'state',
 'violations',
 'zip']


In [6]:
line = next(reader)

In [7]:
pprint(line)

['5255 W MADISON ST ',
 'RED SNAPPER FISH CHICKEN & PIZZA',
 'CHICAGO',
 'RED SNAPPER FISH CHICKEN & PIZZA',
 'Restaurant',
 '2016-09-26T00:00:00.000',
 '1965287',
 'Canvass',
 '41.880236543865834',
 '1991820.0',
 "{'type': 'Point', 'coordinates': [-87.757220392117, 41.880236543866]}",
 '-87.7572203921175',
 'Pass w/ Conditions',
 'Risk 1 (High)',
 'IL',
 '35. WALLS, CEILINGS, ATTACHED EQUIPMENT CONSTRUCTED PER CODE: GOOD REPAIR, '
 'SURFACES CLEAN AND DUST-LESS CLEANING METHODS - Comments: MUST CLEAN THE '
 'WALLS AT WALL BASE NEAR THE MIXER IN REAR OF PREMISES AND THE PREP AREA OF '
 'FOOD SPILLS AND CLEAN THE WALL VENT IN PREP AREA ,INSTRUCTED TO CLEAN AND '
 'MAINTAIN AREA | 33. FOOD AND NON-FOOD CONTACT EQUIPMENT UTENSILS CLEAN, FREE '
 'OF ABRASIVE DETERGENTS - Comments: MUST CLEAN THE INTERIOR PANEL OF THE ICE '
 'MACHINE IN REAR OF PREMISES | 34. FLOORS: CONSTRUCTED PER CODE, CLEANED, '
 'GOOD REPAIR, COVING INSTALLED, DUST-LESS CLEANING METHODS USED - Comments: '
 'MUST CLEAN 

You can control the behavior of `csv.reader` through a `Dialect` object. By default, `csv.reader` uses a Dialect object called "excel." Here let's look at the attributes of the excel dialect.

In [13]:
items = csv.excel.__dict__.items()

pprint({key: value for key, value in items if not key.startswith("_")})

{'delimiter': ',',
 'doublequote': True,
 'lineterminator': '\r\n',
 'quotechar': '"',
 'quoting': 0,
 'skipinitialspace': False}


You might also find `DictReader` to be useful. 

First, let's back up in the file to the beginning.

In [14]:
csv_file.seek(0)

0

In [15]:
reader = csv.DictReader(csv_file)

In [16]:
pprint(next(reader))

OrderedDict([('address', '5255 W MADISON ST '),
             ('aka_name', 'RED SNAPPER FISH CHICKEN & PIZZA'),
             ('city', 'CHICAGO'),
             ('dba_name', 'RED SNAPPER FISH CHICKEN & PIZZA'),
             ('facility_type', 'Restaurant'),
             ('inspection_date', '2016-09-26T00:00:00.000'),
             ('inspection_id', '1965287'),
             ('inspection_type', 'Canvass'),
             ('latitude', '41.880236543865834'),
             ('license_', '1991820.0'),
             ('location',
              "{'type': 'Point', 'coordinates': [-87.757220392117, "
              '41.880236543866]}'),
             ('longitude', '-87.7572203921175'),
             ('results', 'Pass w/ Conditions'),
             ('risk', 'Risk 1 (High)'),
             ('state', 'IL'),
             ('violations',
              '35. WALLS, CEILINGS, ATTACHED EQUIPMENT CONSTRUCTED PER CODE: '
              'GOOD REPAIR, SURFACES CLEAN AND DUST-LESS CLEANING METHODS - '
              'Comments: 

As you can see, `csv.DictReader` converts each row into an ordered dictionary, joining the header row information with each cell. Let's close the file.

In [24]:
csv_file.close()

One last handy trick with the `csv` module is to use `Sniffer` to determine the csv dialect for you.

In [25]:
file_name = "data/health_inspection_chi_sample.csv"

with open(file_name) as csv_file:
    
    try:
        dialect = csv.Sniffer().sniff(csv_file.read(1024))
    except csv.Error as err:
        # log that this file format couldn't be deduced
        print(f"The format of {file_name} could not be detected.")
    else:
        csv_file.seek(0)
    
        dta = csv.reader(csv_file, dialect=dialect)

A big part of increased productivity is saving yourself some work later. The first rule of dealing with data is probably something like "The data is against you. Act accordingly."

We set this block of code up to be pretty paranoid. Software engineers call this **defensive programming**. It's a good habit to get into when you're doing any data science work in Python. 

There are three things to note here. First, is the use of `Sniffer` at all. If `file_name` is a standard csv format, we'll be able to read it.

The second is using a `try/except/else` block. In a `try/except` block, any exception that is raised will trigger the code in the except block. If no exception is raised, the *optional* else block will run.

Let's take a look at a toy example to fix ideas.

In [26]:
try:
    1/0
except:
    print("Something went wrong.")

Something went wrong.


In [27]:
try:
    1/0
except ZeroDivisionError as err:
    print("Something went wrong.")
    raise(err)

Something went wrong.


ZeroDivisionError: division by zero

In [28]:
try:
    1/0
except FileNotFoundError:
    print("This error isn't raised, so we're not here.")

ZeroDivisionError: division by zero

The final thing to note in the block above is the use of `print` to provide some information about what went wrong. Logging is another really good habit to get into, and print statements are the dead simplest way to log the behavior of your code.

In practice, you probably don't want to use `print`. You want to use the [logging](https://docs.python.org/3/library/logging.html) module, but we're not going to talk about best practices in logging anymore today.

### Reading Data: json

Perhaps the second most common file format after CSVs is the `JSON` format. JSON stands for JavaScript Object Notation. When reading data from an API, for example, you will often encounter json files. Or reading data exported from ElasticSearch.

Each line in the file `data/health_inspection_chi_sample.json` is a single json object that represents the same data above. 

In [29]:
!head -n 1 data/health_inspection_chi_sample.json

{"address":"5255 W MADISON ST ","aka_name":"RED SNAPPER FISH CHICKEN & PIZZA","city":"CHICAGO","dba_name":"RED SNAPPER FISH CHICKEN & PIZZA","facility_type":"Restaurant","inspection_date":"2016-09-26T00:00:00.000","inspection_id":1965287,"inspection_type":"Canvass","latitude":41.8802365439,"license_":1991820.0,"location":{"type":"Point","coordinates":[-87.7572203921,41.8802365439]},"longitude":-87.7572203921,"results":"Pass w\/ Conditions","risk":"Risk 1 (High)","state":"IL","violations":"35. WALLS, CEILINGS, ATTACHED EQUIPMENT CONSTRUCTED PER CODE: GOOD REPAIR, SURFACES CLEAN AND DUST-LESS CLEANING METHODS - Comments: MUST CLEAN THE WALLS AT WALL BASE NEAR THE MIXER IN REAR OF PREMISES AND THE PREP AREA OF FOOD SPILLS AND CLEAN THE WALL VENT IN PREP AREA ,INSTRUCTED TO CLEAN AND MAINTAIN AREA | 33. FOOD AND NON-FOOD CONTACT EQUIPMENT UTENSILS CLEAN, FREE OF ABRASIVE DETERGENTS - Comments: MUST CLEAN THE INTERIOR PANEL OF THE ICE MACHINE IN REAR OF PREMISES | 34. FLOORS: CONSTRUCTED PE

We can use the `json` module to read and manipulate json data.

In [30]:
import json

Since each line is a json object here, we need to iterate over the file and parse each line. We use the `json.loads` function here for "load string." The similar function `json.load` takes a file-like object.

In [31]:
dta = []

with open("data/health_inspection_chi_sample.json") as json_file:
    for line in json_file:
        line = json.loads(line)
        dta.append(line)

pprint(dta[0])

{'address': '5255 W MADISON ST ',
 'aka_name': 'RED SNAPPER FISH CHICKEN & PIZZA',
 'city': 'CHICAGO',
 'dba_name': 'RED SNAPPER FISH CHICKEN & PIZZA',
 'facility_type': 'Restaurant',
 'inspection_date': '2016-09-26T00:00:00.000',
 'inspection_id': 1965287,
 'inspection_type': 'Canvass',
 'latitude': 41.8802365439,
 'license_': 1991820.0,
 'location': {'coordinates': [-87.7572203921, 41.8802365439], 'type': 'Point'},
 'longitude': -87.7572203921,
 'results': 'Pass w/ Conditions',
 'risk': 'Risk 1 (High)',
 'state': 'IL',
 'violations': '35. WALLS, CEILINGS, ATTACHED EQUIPMENT CONSTRUCTED PER CODE: '
               'GOOD REPAIR, SURFACES CLEAN AND DUST-LESS CLEANING METHODS - '
               'Comments: MUST CLEAN THE WALLS AT WALL BASE NEAR THE MIXER IN '
               'REAR OF PREMISES AND THE PREP AREA OF FOOD SPILLS AND CLEAN '
               'THE WALL VENT IN PREP AREA ,INSTRUCTED TO CLEAN AND MAINTAIN '
               'AREA | 33. FOOD AND NON-FOOD CONTACT EQUIPMENT UTENSILS CLEAN

`json.loads` places each json object into a Python dictionary, helpfully filling in `None` for `null` for missing values and otherwise preserving types. It also, works recursively as we see in the `location` field.

## Aside: List Comprehensions

Let's take a look at another Pythonic concept, introduced a bit above, called a **list comprehension**. This is what's called *syntactic sugar*. It's a concise way to create a list.

In [32]:
[i for i in range(1, 6)]

[1, 2, 3, 4, 5]

Alternatively, we could have made this list by writing

In [33]:
result_list = []

for i in range(1, 6):
    result_list.append(i)

result_list

[1, 2, 3, 4, 5]

List comprehensions can contain logic.

In [34]:
x = ['a', 'b', 'c', 'd', '_e', '_f']

In [36]:
[i for i in x if not i.startswith('_')]

['a', 'b', 'c', 'd']

You can also use a an else clause. Notice the slightly different syntax.

In [37]:
[i if not i.startswith('_') else 'skipped' for i in x]

['a', 'b', 'c', 'd', 'skipped', 'skipped']

List comprehensions can be nested, though it's usually best practices not to go overboard. They can quickly become difficult to read.

In [38]:
matrix = [
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12],
]

Notice that we write this from outside to in.

In [39]:
[element for row in matrix for element in row]

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

You can also create **dictionary comprehensions**.

In [40]:
pairs = [
    ("first_initial", "J"), 
    ("last_name", "Doe"), 
    ("address", "1234 Main Street")
]

In [41]:
{key: value for key, value in pairs}

{'first_initial': 'J', 'last_name': 'Doe', 'address': '1234 Main Street'}

## Exercise

Returning to the code that we introduced above, we can take further control over how a file with json objects is read in by using the `object_hook` argument in `json.loads`. Say we wanted to remove the `location` field above. We don't need the `geoJSON` formatted information. We could do so with the `object_hook`. 

The syntax is: `json.loads(json_object, object_hook=some_function)`

This will pass the `json_object` to the function passed to the `object_hook` argument.

Write a function called `remove_entry` that removes the `'location'` field from each record in the `'data/health_inspection_chi_sample.json'` file. Do not forget to `import json`.

Pass this function to the `object_hook` argument of `json.loads`. Be careful, `object_hook` will get called recursively on nested json objects.

In [None]:
# Type your solution here

In [None]:
# %load solutions/object_hook_json.py
import json
from pprint import pprint


def remove_entry(record):
    try:
        del record['location']
    # this is called recursively on objects so not all have it
    except KeyError:
        pass

    return record


def parse_json(record):
    return json.loads(record, object_hook=remove_entry)


with open("data/health_inspection_chi_sample.json") as json_file:
    dta = [parse_json(line) for line in json_file]


pprint(dta[0])


### Reading Data: pandas

Now let's take a look at using [pandas]() to read data.

#### Introducing Pandas

First, a few words of introduction for **pandas**. Pandas is a Python package, built on top of **numpy**, that provides fast, flexible, and expressive data structures designed to work with relational or labeled data. It is a high-level tool for doing practical, real world data analysis in Python.

You reach for pandas when you have:

* Tabular data with heterogeneously-typed columns
* Ordered and unordered (not necessarily fixed-frequency) time series data.
* Arbitrary matrix data with row and column labels

Almost any dataset can be converted to a pandas data structure for cleaning, transformation, and analysis. Incidentally, the pandas DataFrame also works well with __*R*__.


First, let's import the pandas.

In [44]:
import pandas as pd

Let's set some display options.

In [45]:
pd.set_option("max.rows", 6)

Pandas has facilities for reading csv files and files containing JSON records (and other formats). We can use `read_csv` for csv files.

In [46]:
pd.read_csv("data/health_inspection_chi_sample.csv")

Unnamed: 0,address,aka_name,city,dba_name,facility_type,inspection_date,inspection_id,inspection_type,latitude,license_,location,longitude,results,risk,state,violations,zip
0,5255 W MADISON ST,RED SNAPPER FISH CHICKEN & PIZZA,CHICAGO,RED SNAPPER FISH CHICKEN & PIZZA,Restaurant,2016-09-26T00:00:00.000,1965287,Canvass,41.880237,1991820.0,"{'type': 'Point', 'coordinates': [-87.75722039...",-87.757220,Pass w/ Conditions,Risk 1 (High),IL,"35. WALLS, CEILINGS, ATTACHED EQUIPMENT CONSTR...",60644.0
1,5958 W DIVERSEY AVE,TAQUERIA MORELOS,CHICAGO,TAQUERIA MORELOS,Restaurant,2014-02-06T00:00:00.000,1329698,Canvass,41.931250,2099479.0,"{'type': 'Point', 'coordinates': [-87.77590699...",-87.775907,Pass,Risk 1 (High),IL,33. FOOD AND NON-FOOD CONTACT EQUIPMENT UTENSI...,60639.0
2,5400-5402 N CLARK ST,HAMBURGER MARY'S/MARY'S REC ROOM,CHICAGO,HAMBURGER MARY'S CHICAGO/MARY'S REC ROOM,Restaurant,2010-12-03T00:00:00.000,470787,SFP,41.979884,1933748.0,"{'type': 'Point', 'coordinates': [-87.66842948...",-87.668429,Fail,Risk 1 (High),IL,"6. HANDS WASHED AND CLEANED, GOOD HYGIENIC PRA...",60640.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
997,3251 E 92ND ST,VICTORY CENTRE OF SOUTH CHICAGO SLF,CHICAGO,VICTORY CENTRE OF SOUTH CHICAGO SLF,ASSISTED LIVING,2012-10-12T00:00:00.000,1296216,Canvass,41.728214,1968546.0,"{'type': 'Point', 'coordinates': [-87.54469279...",-87.544693,Pass,Risk 1 (High),IL,"34. FLOORS: CONSTRUCTED PER CODE, CLEANED, GOO...",60617.0
998,4300 S UNION AVE,TUXEDO JUNCTION #2,CHICAGO,TUXEDO JUNCTION #2,Grocery Store,2011-07-14T00:00:00.000,585793,Canvass,41.816186,8974.0,"{'type': 'Point', 'coordinates': [-87.64345182...",-87.643452,Fail,Risk 2 (Medium),IL,"16. FOOD PROTECTED DURING STORAGE, PREPARATION...",60609.0
999,3820 W CHICAGO AVE,Bro-N-Laws,CHICAGO,Bro-N-Laws,Restaurant,2015-08-19T00:00:00.000,1566470,Complaint Re-Inspection,41.895513,2288968.0,"{'type': 'Point', 'coordinates': [-87.72216981...",-87.722170,Pass,Risk 1 (High),IL,45. FOOD HANDLER REQUIREMENTS MET - Comments: ...,60651.0


`read_csv` is one of the best/worst functions in all of Python. It's great because it does just about everything. It's terrible, because it does just about everything. Chances are if you have a special case that pandas `read_csv` will accomodate your needs. Go ahead and have a look at the `docstring`.


In [47]:
pd.read_csv?

`read_csv` returns a pandas DataFrame. We'll take a deeper dive into DataFrames next when we start to clean this data set.

The JSON counterpart to `read_csv` is `read_json`.

## Exercise

Use `pd.read_json` to read in the Chicago health inspections json sample in the `data` folder. 

__Hints__: 

`read_json` has an `orient` kwarg that determines how pandas imports the data. Two of the options are `record` and `index`. For our example, since the json data is a list of json objects, each containing a separate record, the proper value for `orient` is "record". Try "index" to see how the data differs.

Also, the `lines` kwarg, indicating whether the file being read consists of a json object per line, must be `True`.



In [None]:
# Type your solution Here

In [61]:
# %load solutions/read_json.py
import pandas as pd

pd.read_json(
    "data/health_inspection_chi_sample.json",
    orient="record",
    lines=True
)


Unnamed: 0,address,aka_name,city,dba_name,facility_type,inspection_date,inspection_id,inspection_type,latitude,license_,location,longitude,results,risk,state,violations,zip
0,5255 W MADISON ST,RED SNAPPER FISH CHICKEN & PIZZA,CHICAGO,RED SNAPPER FISH CHICKEN & PIZZA,Restaurant,2016-09-26T00:00:00.000,1965287,Canvass,41.880237,1991820.0,"{'type': 'Point', 'coordinates': [-87.75722039...",-87.757220,Pass w/ Conditions,Risk 1 (High),IL,"35. WALLS, CEILINGS, ATTACHED EQUIPMENT CONSTR...",60644
1,5958 W DIVERSEY AVE,TAQUERIA MORELOS,CHICAGO,TAQUERIA MORELOS,Restaurant,2014-02-06T00:00:00.000,1329698,Canvass,41.931250,2099479.0,"{'type': 'Point', 'coordinates': [-87.77590699...",-87.775907,Pass,Risk 1 (High),IL,33. FOOD AND NON-FOOD CONTACT EQUIPMENT UTENSI...,60639
2,5400-5402 N CLARK ST,HAMBURGER MARY'S/MARY'S REC ROOM,CHICAGO,HAMBURGER MARY'S CHICAGO/MARY'S REC ROOM,Restaurant,2010-12-03T00:00:00.000,470787,SFP,41.979884,1933748.0,"{'type': 'Point', 'coordinates': [-87.66842948...",-87.668429,Fail,Risk 1 (High),IL,"6. HANDS WASHED AND CLEANED, GOOD HYGIENIC PRA...",60640
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
997,3251 E 92ND ST,VICTORY CENTRE OF SOUTH CHICAGO SLF,CHICAGO,VICTORY CENTRE OF SOUTH CHICAGO SLF,ASSISTED LIVING,2012-10-12T00:00:00.000,1296216,Canvass,41.728214,1968546.0,"{'type': 'Point', 'coordinates': [-87.54469279...",-87.544693,Pass,Risk 1 (High),IL,"34. FLOORS: CONSTRUCTED PER CODE, CLEANED, GOO...",60617
998,4300 S UNION AVE,TUXEDO JUNCTION #2,CHICAGO,TUXEDO JUNCTION #2,Grocery Store,2011-07-14T00:00:00.000,585793,Canvass,41.816186,8974.0,"{'type': 'Point', 'coordinates': [-87.64345182...",-87.643452,Fail,Risk 2 (Medium),IL,"16. FOOD PROTECTED DURING STORAGE, PREPARATION...",60609
999,3820 W CHICAGO AVE,Bro-N-Laws,CHICAGO,Bro-N-Laws,Restaurant,2015-08-19T00:00:00.000,1566470,Complaint Re-Inspection,41.895513,2288968.0,"{'type': 'Point', 'coordinates': [-87.72216981...",-87.722170,Pass,Risk 1 (High),IL,45. FOOD HANDLER REQUIREMENTS MET - Comments: ...,60651


### Reading Data: web data

So far, we've seen some ways that we can read data from disk. As Data Scientists, we often need to go out and grab data from the Internet.

Generally Python is "batteries included" and reading data from the Internet is no exception, but there are some *great* packages out there. [requests](http://docs.python-requests.org/en/master/) is one of them for making HTTP requests.

Let's look at how we can use the [Chicago Data Portal](https://data.cityofchicago.org/) API to get this data in the first place. (I originally used San Francisco for this, but the data was just too clean to be terribly interesting.)

In [62]:
import requests

We use the requests library to perform a GET request to the API, passing an optional query string via `params` to limit the returned number of records. The parameters are documented as part of the [Socrata Open Data API](https://dev.socrata.com/consumers/getting-started.html) (SODA).

In [63]:
response = requests.get(
    "https://data.cityofchicago.org/resource/cwig-ma7x.json", 
    params="$limit=10"
)

Requests returns a [Reponse](http://docs.python-requests.org/en/master/api/#requests.Response) object with many helpful methods and attributes.

In [64]:
response

<Response [200]>

In [65]:
response.ok

True

In [66]:
dta = pd.read_json(response.content, orient='records')

We can use the `head` method to peak at the first 5 rows of data.

In [67]:
dta.head()

Unnamed: 0,address,aka_name,city,dba_name,facility_type,inspection_date,inspection_id,inspection_type,latitude,license_,location,longitude,results,risk,state,violations,zip
0,4949-4951 N BROADWAY,IMMM,CHICAGO,IMMM,Restaurant,2015-12-15T00:00:00.000,1591712,License,41.972963,2437509,"{'type': 'Point', 'coordinates': [-87.65962861...",-87.659629,Fail,Risk 1 (High),IL,"11. ADEQUATE NUMBER, CONVENIENT, ACCESSIBLE, D...",60640
1,2804 N CLARK ST,Wells Street Popcorn,CHICAGO,Wells Street Popcorn,Restaurant,2010-02-01T00:00:00.000,68091,Canvass,41.932921,1954774,"{'type': 'Point', 'coordinates': [-87.64515454...",-87.645155,Pass,Risk 2 (Medium),IL,,60657
2,100 N LA SALLE ST,PRET A MANGER,CHICAGO,PRET A MANGER,Restaurant,2010-12-06T00:00:00.000,409432,License,41.883237,2068718,"{'type': 'Point', 'coordinates': [-87.63255557...",-87.632556,Pass,Risk 1 (High),IL,,60602
3,10806 S MICHIGAN AVE,"EDDIE'S SUNSHINE FOOD & LIQUOR, INC",CHICAGO,"EDDIE'S SUNSHINE FOOD & LIQUOR, INC",Grocery Store,2012-09-11T00:00:00.000,1154555,Canvass,41.697818,24217,"{'type': 'Point', 'coordinates': [-87.62101100...",-87.621011,Fail,Risk 2 (Medium),IL,18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN...,60628
4,219 W WASHINGTON ST,MED KITCHEN,CHICAGO,MED KITCHEN,Restaurant,2011-07-28T00:00:00.000,614371,Canvass,41.8831,2055163,"{'type': 'Point', 'coordinates': [-87.63443384...",-87.634434,Pass,Risk 1 (High),IL,"35. WALLS, CEILINGS, ATTACHED EQUIPMENT CONSTR...",60606


Of course, pandas can also load data directly from a URL, but I encourage you to reach for `requests` as often as you need it.

## Exercise

Try passing the URL above to `pd.read_json`. What happens?

In [None]:
# Type your solution here

In [69]:
# %load solutions/read_url_json.py
import pandas as pd

url = ('https://data.cityofchicago.org/'
              'resource/cwig-ma7x.json?$limit=5')
pd.read_json(url, orient='records')


Unnamed: 0,address,aka_name,city,dba_name,facility_type,inspection_date,inspection_id,inspection_type,latitude,license_,location,longitude,results,risk,state,violations,zip
0,2804 N CLARK ST,Wells Street Popcorn,CHICAGO,Wells Street Popcorn,Restaurant,2010-02-01T00:00:00.000,68091,Canvass,41.932921,1954774,"{'type': 'Point', 'coordinates': [-87.64515454...",-87.645155,Pass,Risk 2 (Medium),IL,,60657
1,6744 N SHERIDAN RD,RICE THAI CAFE,CHICAGO,RICE THAI CAFE,Restaurant,2015-08-21T00:00:00.000,1482935,Canvass Re-Inspection,42.004881,2354674,"{'type': 'Point', 'coordinates': [-87.66101071...",-87.661011,Pass,Risk 1 (High),IL,32. FOOD AND NON-FOOD CONTACT SURFACES PROPERL...,60626
2,160-164 E SUPERIOR ST,GINO'S EAST PIZZERIA,CHICAGO,GINO'S EAST PIZZERIA,Restaurant,2015-07-28T00:00:00.000,1447916,Suspected Food Poisoning Re-inspection,41.895863,1697132,"{'type': 'Point', 'coordinates': [-87.62325304...",-87.623253,Pass,Risk 1 (High),IL,32. FOOD AND NON-FOOD CONTACT SURFACES PROPERL...,60611
3,5844-5846 N BROADWAY,RAS DASHEN ETHIOPIAN RESTAURANT INC,CHICAGO,RAS DASHEN ETHIOPIAN RESTAURANT INC,Restaurant,2017-01-04T00:00:00.000,1978933,Canvass,41.988326,1122395,"{'type': 'Point', 'coordinates': [-87.66036036...",-87.66036,Fail,Risk 1 (High),IL,"16. FOOD PROTECTED DURING STORAGE, PREPARATION...",60660
4,2352-2358 N MILWAUKEE AVE,EAST ROOM,CHICAGO,EAST ROOM,Liquor,2013-11-18T00:00:00.000,1375515,License Re-Inspection,41.923873,2263696,"{'type': 'Point', 'coordinates': [-87.69916285...",-87.699163,Pass,Risk 3 (Low),IL,,60647


Notice how you can split a string across lines. This can be a very handy tip for improving readability, by splitting a string and putting it in parentheses, we preserve a single string.

In [70]:
("super "
 "long "
 "string "
 "split "
 "across "
 "lines")

'super long string split across lines'

#### Pandas DataReader

In addition to the core I/O functionality in pandas, there is also the [pandas-datareader](https://pandas-datareader.readthedocs.io/en/latest/) project. This package provides programmatic access to data sets from

* Yahoo! Finance (deprecated)
* Google Finance
* Enigma
* Quandl
* FRED
* Fama/French
* World Bank
* OECD
* Eurostat
* EDGAR Index (deprecated)
* TSP Fund Data
* Nasdaq Trader Symbol Definitions
* Morningstar
* Etc.

#### Further Resources

Sometimes we need to be resourceful in order to get data. Knowing how to scrape the web can really come in handy.

We're not going to go into details today, but you'll likely find libraries like [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/), [lxml](http://lxml.de/), and [mechanize](https://mechanize.readthedocs.io/en/latest/) to be helpful. 

There's also a `read_html` function in pandas that will quickly scrape HTML tables for you and put them into a DataFrame. 