In [1]:
## all imports
from IPython.display import HTML
import numpy as np
import bs4 #this is beautiful soup

from pandas import Series
import pandas as pd
from pandas import DataFrame

import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline


# Data Scraping

In this module, we'll focus on a data that has become extremely common on the Internet:  text data.  In principle, text is just another form of data, and text processing is just another part of "data wrangling".  While text is advantageous in that there is so much of it out there that can be used, it is challenging because it is "unstructured."  It does not have the usual "tabular" characteristics, with fields.  Its also relatively "dirty", people misspell wrods and runwordstogether #unpredictably.

Today, we'll talk about data scraping as it is associated with obtaining data from webpages. There is low level scraping where you parse the data out of the html code of the webpage. There also is scraping over APIs or Application Program Interface from websites who try to make your life a bit easier.  Its basically a language that helps you access features of a dataset, like text on a webpage.


## Scraping:  HTML and APIs with Python


## 4. Web Scraping using Beautiful Soup

Let's scrape some data using a fun library called Beautiful Soup. We'll create a CSV dataset of the a table on 311 reported Rodent Incidents around Boston.

The website we are going to scrape is here.

[County Housing Statistics](http://duspviz.mit.edu/_assets/data/county_housing_stats.html)

Let's get started!

#### Importing Modules

First import modules. **import requests** imports the requests module, and **import bs4** imports the Beautiful Soup library.

FYI:  This tutorial is based on material developed by [DSUPviz](http://duspviz.mit.edu/tutorials/python-scraping/).

In [2]:
import bs4
import requests

#### Testing out Requests

Requests will allow us to load a webpage into python so that we can parse it and manipulate it. Test this by running the following. Enter the following commands in terminal, and hit enter after entering each to run each of them.

This allowed us to access all of the content from the source code of the webpage with Python, which we can now parse and extract data. It even printed to our console. Pretty cool!

In [3]:
response = requests.get('http://duspviz.mit.edu/_assets/data/county_housing_stats.html')
print(response.text) # Print the output


<html><head><title>US County Housing Stats</title>
<style>
td {
	text-align: center;
}
</style>
</head>
<body>
<p class="title"><b>U.S. Basic Housing Stats by County</b></p>
<p>American Community Survey 5-year Estimates (2010-2015)</p>
<p class="story">The following are some basic county level housing stats. Some sample links are here:
<a href="http://example.com/link1" class="link" id="link1">Link 1</a>,
<a href="http://example.com/link2" class="link" id="link2">Link 2</a>,
<a href="http://example.com/link3" class="link" id="link3">Link 3</a>, and
<a href="http://example.com/link4" class="link" id="link4">Link 4</a>;
This data is only available here.</p>
<table><thead><tr><th>County Name</th><th>FIPS Code</th><th>Total Pop</th><th>Median Income ($)</th>	<th>No. of Housing Units</th><th>Median Home Value ($)</th><th>No. of Owner Occupied Housing Units</th><th>No. of Own Occ. Housing Units with Debt</th><th>No. of Own Occ. Housing Units without Debt</th></tr></thead><tbody>
<tr><td cla

### Testing out Beautiful Soup
Our next big step is to test out Beautiful Soup. Let's talk about what this is...

What is Beautiful Soup?
Beautiful Soup is a Python library for parsing data out of HTML and XML files (aka webpages). It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. The major concept with Beautiful Soup is that it allows you to access elements of your page by following the CSS structures, such as grabbing all links, all headers, specific classes, or more. It is a powerful library. Once we grab elements, Python makes it easy to write the elements or relevant components of the elements into other files, such as a CSV, that can be stored in a database or opened in other software.

The sample webpage we are using contains data on 'rodent incidents' in the greater Boston area. Let's use this file to explore the tree, and extract some data.

Our first step is to *Make the Soup*

First, we have to turn the website code into a Python object. We have already imported the Beautiful Soup library, so we can start calling some of the methods in the libary.  We will replace print response.text with the following command, and this turns the text into an Python object named soup.

An important note: You need to specify the specific parser that Beautiful Soup uses to parse your text. This is done in the second argument of the BeautifulSoup function. The default is the built in Python parser, which we can call using html.parser

You an also use lxml or html5lib. This is nicely described in the [documentation](http://www.crummy.com/software/BeautifulSoup/bs4/doc/). For our purposes, using the default is fine.

Using the Beautiful Soup prettify() function, we can print the page to see the code printed in a readable and legible manner.

At any point, if you need a reference, visit the Beautiful Soup [documentation](http://www.crummy.com/software/BeautifulSoup/bs4/doc/) for the official descriptions of functions. Prettify is a handy one to see our document in a clean fashion.

In [4]:
soup = bs4.BeautifulSoup(response.text, "html.parser")
print(soup.prettify()) # Print the output using the 'prettify' function

<html>
 <head>
  <title>
   US County Housing Stats
  </title>
  <style>
   td {
	text-align: center;
}
  </style>
 </head>
 <body>
  <p class="title">
   <b>
    U.S. Basic Housing Stats by County
   </b>
  </p>
  <p>
   American Community Survey 5-year Estimates (2010-2015)
  </p>
  <p class="story">
   The following are some basic county level housing stats. Some sample links are here:
   <a class="link" href="http://example.com/link1" id="link1">
    Link 1
   </a>
   ,
   <a class="link" href="http://example.com/link2" id="link2">
    Link 2
   </a>
   ,
   <a class="link" href="http://example.com/link3" id="link3">
    Link 3
   </a>
   , and
   <a class="link" href="http://example.com/link4" id="link4">
    Link 4
   </a>
   ;
This data is only available here.
  </p>
  <table>
   <thead>
    <tr>
     <th>
      County Name
     </th>
     <th>
      FIPS Code
     </th>
     <th>
      Total Pop
     </th>
     <th>
      Median Income ($)
     </th>
     <th>
      No. of Hous

## Navigating the Data Structure

With our data from the webpage nicely laid out, Beautiful Soup allows us to now navigate the data structure. We called our Beautiful Soup object soup, so we can run the Beautiful Soup functions on this object. Let's explore some ways to do this, try entering some of the following into your terminal.

In [5]:
# Access the title element
soup.title

<title>US County Housing Stats</title>

In [6]:
# Access the content of the title element
soup.title.string

u'US County Housing Stats'

In [7]:
# Access data in the first 'p' tag
soup.p

<p class="title"><b>U.S. Basic Housing Stats by County</b></p>

In [8]:
# Access data in the first 'a' tag
soup.a

<a class="link" href="http://example.com/link1" id="link1">Link 1</a>

In [9]:
# Retrieve all links in the document (note it returns an array)
soup.find_all('a')

[<a class="link" href="http://example.com/link1" id="link1">Link 1</a>,
 <a class="link" href="http://example.com/link2" id="link2">Link 2</a>,
 <a class="link" href="http://example.com/link3" id="link3">Link 3</a>,
 <a class="link" href="http://example.com/link4" id="link4">Link 4</a>]

In [10]:
# Retrieve elements by class equal to link using the attributes argument
soup.findAll(attrs={'class' : 'link'})

[<a class="link" href="http://example.com/link1" id="link1">Link 1</a>,
 <a class="link" href="http://example.com/link2" id="link2">Link 2</a>,
 <a class="link" href="http://example.com/link3" id="link3">Link 3</a>,
 <a class="link" href="http://example.com/link4" id="link4">Link 4</a>]

In [11]:
# Retrieve a specific link by ID
soup.find(id="link3")

<a class="link" href="http://example.com/link3" id="link3">Link 3</a>

In [12]:
# Access Data in the table (note it returns an array)
soup.find_all('td')

[<td class="name">Autauga County, Alabama</td>,
 <td class="fips">01001</td>,
 <td class="tot-pop">55221</td>,
 <td class="median-income">51281</td>,
 <td class="no-housing-units">22582</td>,
 <td class="med-home-val">141300</td>,
 <td class="owner-occupied">15077</td>,
 <td class="house-w-debt">9668</td>,
 <td class="house-wo-debt">5409</td>,
 <td class="name">Baldwin County, Alabama</td>,
 <td class="fips">01003</td>,
 <td class="tot-pop">195121</td>,
 <td class="median-income">50254</td>,
 <td class="no-housing-units">106422</td>,
 <td class="med-home-val">169300</td>,
 <td class="owner-occupied">52997</td>,
 <td class="house-w-debt">31824</td>,
 <td class="house-wo-debt">21173</td>,
 <td class="name">Barbour County, Alabama</td>,
 <td class="fips">01005</td>,
 <td class="tot-pop">26932</td>,
 <td class="median-income">32964</td>,
 <td class="no-housing-units">11810</td>,
 <td class="med-home-val">92200</td>,
 <td class="owner-occupied">5864</td>,
 <td class="house-w-debt">2691</td>

## Working with Arrays
The easiest way to access elements and then either write them to file or manipulate them is to save them as objects themselves. Note that our data is organzed into counties and several numbers. Let's save these to arrays, which are the easiest way to work with the data.

The following gives us an array, we can work with the elements.

In [13]:
data = soup.findAll(attrs={'class':'name'})
data[0]

<td class="name">Autauga County, Alabama</td>

In [14]:
data = soup.findAll(attrs={'class':'name'})
print(data[0].string)
print(data[1].string)
print(data[2].string)
print(data[3].string)

Autauga County, Alabama
Baldwin County, Alabama
Barbour County, Alabama
Bibb County, Alabama


In [15]:
data = soup.findAll(attrs={'class':'name'})
for i in data:
    print(i.string)

Autauga County, Alabama
Baldwin County, Alabama
Barbour County, Alabama
Bibb County, Alabama
Blount County, Alabama
Bullock County, Alabama
Butler County, Alabama
Calhoun County, Alabama
Chambers County, Alabama
Cherokee County, Alabama
Chilton County, Alabama
Choctaw County, Alabama
Clarke County, Alabama
Clay County, Alabama
Cleburne County, Alabama
Coffee County, Alabama
Colbert County, Alabama
Conecuh County, Alabama
Coosa County, Alabama
Covington County, Alabama
Crenshaw County, Alabama
Cullman County, Alabama
Dale County, Alabama
Dallas County, Alabama
DeKalb County, Alabama
Elmore County, Alabama
Escambia County, Alabama
Etowah County, Alabama
Fayette County, Alabama
Franklin County, Alabama
Geneva County, Alabama
Greene County, Alabama
Hale County, Alabama
Henry County, Alabama
Houston County, Alabama
Jackson County, Alabama
Jefferson County, Alabama
Lamar County, Alabama
Lauderdale County, Alabama
Lawrence County, Alabama
Lee County, Alabama
Limestone County, Alabama
Lowndes 

Sumter County, Florida
Suwannee County, Florida
Taylor County, Florida
Union County, Florida
Volusia County, Florida
Wakulla County, Florida
Walton County, Florida
Washington County, Florida
Appling County, Georgia
Atkinson County, Georgia
Bacon County, Georgia
Baker County, Georgia
Baldwin County, Georgia
Banks County, Georgia
Barrow County, Georgia
Bartow County, Georgia
Ben Hill County, Georgia
Berrien County, Georgia
Bibb County, Georgia
Bleckley County, Georgia
Brantley County, Georgia
Brooks County, Georgia
Bryan County, Georgia
Bulloch County, Georgia
Burke County, Georgia
Butts County, Georgia
Calhoun County, Georgia
Camden County, Georgia
Candler County, Georgia
Carroll County, Georgia
Catoosa County, Georgia
Charlton County, Georgia
Chatham County, Georgia
Chattahoochee County, Georgia
Chattooga County, Georgia
Cherokee County, Georgia
Clarke County, Georgia
Clay County, Georgia
Clayton County, Georgia
Clinch County, Georgia
Cobb County, Georgia
Coffee County, Georgia
Colquit

Stanton County, Nebraska
Thayer County, Nebraska
Thomas County, Nebraska
Thurston County, Nebraska
Valley County, Nebraska
Washington County, Nebraska
Wayne County, Nebraska
Webster County, Nebraska
Wheeler County, Nebraska
York County, Nebraska
Churchill County, Nevada
Clark County, Nevada
Douglas County, Nevada
Elko County, Nevada
Esmeralda County, Nevada
Eureka County, Nevada
Humboldt County, Nevada
Lander County, Nevada
Lincoln County, Nevada
Lyon County, Nevada
Mineral County, Nevada
Nye County, Nevada
Pershing County, Nevada
Storey County, Nevada
Washoe County, Nevada
White Pine County, Nevada
Carson City, Nevada
Belknap County, New Hampshire
Carroll County, New Hampshire
Cheshire County, New Hampshire
Coos County, New Hampshire
Grafton County, New Hampshire
Hillsborough County, New Hampshire
Merrimack County, New Hampshire
Rockingham County, New Hampshire
Strafford County, New Hampshire
Sullivan County, New Hampshire
Atlantic County, New Jersey
Bergen County, New Jersey
Burlingt

Wyoming County, West Virginia
Adams County, Wisconsin
Ashland County, Wisconsin
Barron County, Wisconsin
Bayfield County, Wisconsin
Brown County, Wisconsin
Buffalo County, Wisconsin
Burnett County, Wisconsin
Calumet County, Wisconsin
Chippewa County, Wisconsin
Clark County, Wisconsin
Columbia County, Wisconsin
Crawford County, Wisconsin
Dane County, Wisconsin
Dodge County, Wisconsin
Door County, Wisconsin
Douglas County, Wisconsin
Dunn County, Wisconsin
Eau Claire County, Wisconsin
Florence County, Wisconsin
Fond du Lac County, Wisconsin
Forest County, Wisconsin
Grant County, Wisconsin
Green County, Wisconsin
Green Lake County, Wisconsin
Iowa County, Wisconsin
Iron County, Wisconsin
Jackson County, Wisconsin
Jefferson County, Wisconsin
Juneau County, Wisconsin
Kenosha County, Wisconsin
Kewaunee County, Wisconsin
La Crosse County, Wisconsin
Lafayette County, Wisconsin
Langlade County, Wisconsin
Lincoln County, Wisconsin
Manitowoc County, Wisconsin
Marathon County, Wisconsin
Marinette Co

This array only gives us counties though, let's get all of the data elements from all classes.



In [16]:
data = soup.findAll(attrs={'class':['name','fips','tot-pop','median-income','no-housing-units','med-home-val','owner-occupied','house-w-debt','house-wo-debt']})
for i in data:
    print(i.string)

Autauga County, Alabama
01001
55221
51281
22582
141300
15077
9668
5409
Baldwin County, Alabama
01003
195121
50254
106422
169300
52997
31824
21173
Barbour County, Alabama
01005
26932
32964
11810
92200
5864
2691
3173
Bibb County, Alabama
01007
22604
38678
8971
102700
5278
2605
2673
Blount County, Alabama
01009
57710
45813
23860
119800
16423
8849
7574
Bullock County, Alabama
01011
10678
31938
4465
68600
2609
866
1743
Butler County, Alabama
01013
20354
32229
9919
78900
5644
2348
3296
Calhoun County, Alabama
01015
116648
41703
53296
105900
31179
16611
14568
Chambers County, Alabama
01017
34079
34177
16936
80800
9318
4847
4471
Cherokee County, Alabama
01019
26008
36296
16242
105100
8761
3978
4783
Chilton County, Alabama
01021
43819
41627
19247
100100
12410
6309
6101
Choctaw County, Alabama
01023
13395
33536
7232
61100
4547
1606
2941
Clarke County, Alabama
01025
25070
32011
12600
86900
6372
2448
3924
Clay County, Alabama
01027
13537
35327
6738
84100
4029
1847
2182
Cleburne County, Alabama
010

Yell County, Arkansas
05149
21835
37804
9739
94900
5195
2574
2621
Alameda County, California
06001
1584983
75619
589858
543100
294644
221569
73075
Alpine County, California
06003
1131
52917
1801
295000
282
151
131
Amador County, California
06005
36995
54171
18184
251000
10577
6360
4217
Butte County, California
06007
222564
43444
97133
221700
50031
31251
18780
Calaveras County, California
06009
44767
53233
28031
243800
14266
9243
5023
Colusa County, California
06011
21396
52168
7931
196000
4331
2666
1665
Contra Costa County, California
06013
1096068
80185
405001
439900
248668
187267
61401
Del Norte County, California
06015
27788
40847
11305
183700
5704
3223
2481
El Dorado County, California
06017
182093
69584
88639
363000
49861
35908
13953
Fresno County, California
06019
956749
45233
321955
194600
156474
112196
44278
Glenn County, California
06021
28029
39349
10893
204400
5855
3661
2194
Humboldt County, California
06023
135034
42197
62156
279300
29128
17459
11669
Imperial County, Califo

1656
876
780
Rio Grande County, Colorado
08105
11745
39672
6618
133700
3078
1770
1308
Routt County, Colorado
08107
23606
64963
16378
394600
6519
4402
2117
Saguache County, Colorado
08109
6238
33393
3904
146700
1813
799
1014
San Juan County, Colorado
08111
606
36324
698
223200
182
99
83
San Miguel County, Colorado
08113
7676
56047
6687
512800
2001
1304
697
Sedgwick County, Colorado
08115
2365
44191
1287
86800
705
306
399
Summit County, Colorado
08117
28940
67983
30238
478800
6728
5168
1560
Teller County, Colorado
08119
23340
62372
12730
239000
7728
5590
2138
Washington County, Colorado
08121
4795
45541
2422
113200
1450
642
808
Weld County, Colorado
08123
270948
60572
99226
210100
65424
49449
15975
Yuma County, Colorado
08125
10185
43105
4438
136400
2433
1191
1242
Fairfield County, Connecticut
09001
939983
84233
363556
416000
228381
163508
64873
Hartford County, Connecticut
09003
896943
66395
374453
236400
224000
156517
67483
Litchfield County, Connecticut
09005
186304
72061
87447
254600

Columbia County, Georgia
13073
136204
71021
52267
170800
35613
26750
8863
Cook County, Georgia
13075
17033
35683
7247
86000
4190
2269
1921
Coweta County, Georgia
13077
133416
62461
51447
181000
35494
26672
8822
Crawford County, Georgia
13079
12539
41825
5257
85800
3609
1951
1658
Crisp County, Georgia
13081
23314
31615
10720
98200
5108
2487
2621
Dade County, Georgia
13083
16445
46434
7269
118300
4789
2394
2395
Dawson County, Georgia
13085
22673
56943
10541
188300
6191
3941
2250
Decatur County, Georgia
13087
27378
31284
12137
114400
6303
3417
2886
DeKalb County, Georgia
13089
716331
51376
306218
163000
147739
113757
33982
Dodge County, Georgia
13091
21180
34271
9767
67900
5288
2531
2757
Dooly County, Georgia
13093
14293
28696
6239
73000
3216
1292
1924
Dougherty County, Georgia
13095
93310
32084
40706
101400
16202
9625
6577
Douglas County, Georgia
13097
136520
53881
51775
121300
31687
22880
8807
Early County, Georgia
13099
10579
31680
4930
81600
2640
1224
1416
Echols County, Georgia
13101

38837
4466
126200
2865
1323
1542
Custer County, Idaho
16037
4234
39457
3133
161500
1388
657
731
Elmore County, Idaho
16039
26175
43848
12195
135300
5758
3672
2086
Franklin County, Idaho
16041
12914
48133
4631
160900
3350
1985
1365
Fremont County, Idaho
16043
12945
47988
8626
152100
3685
2068
1617
Gem County, Idaho
16045
16731
40828
7103
140200
4432
2913
1519
Gooding County, Idaho
16047
15233
39930
6059
127700
3589
1979
1610
Idaho County, Idaho
16049
16312
38191
8680
153700
5282
2621
2661
Jefferson County, Idaho
16051
26792
51171
8829
156000
6656
4177
2479
Jerome County, Idaho
16053
22653
41630
8196
135800
4864
3130
1734
Kootenai County, Idaho
16055
145046
49403
65272
185700
39582
27241
12341
Latah County, Idaho
16057
38339
42439
16255
191100
8149
5118
3031
Lemhi County, Idaho
16059
7790
34329
4740
172700
2645
1351
1294
Lewis County, Idaho
16061
3812
36505
1879
115300
1177
597
580
Lincoln County, Idaho
16063
5260
43273
1961
121000
1154
691
463
Madison County, Idaho
16065
37916
32233
123

17203
39106
65852
15325
158200
11663
7191
4472
Adams County, Indiana
18001
34642
48188
13079
116900
9548
5753
3795
Allen County, Indiana
18003
363453
49092
153860
114600
96765
65524
31241
Bartholomew County, Indiana
18005
79488
55050
33422
136000
21588
14099
7489
Benton County, Indiana
18007
8752
47046
3907
83800
2426
1397
1029
Blackford County, Indiana
18009
12476
38190
6008
67500
3877
2199
1678
Boone County, Indiana
18011
60511
67552
24565
187600
17396
13024
4372
Brown County, Indiana
18013
15011
54615
8479
162300
4842
3090
1752
Carroll County, Indiana
18015
20014
52005
9456
112100
6275
3901
2374
Cass County, Indiana
18017
38476
42290
16374
82400
10975
6456
4519
Clark County, Indiana
18019
113181
51699
48626
129000
30563
21118
9445
Clay County, Indiana
18021
26686
47602
11693
95000
7782
5009
2773
Clinton County, Indiana
18023
32835
48478
13253
98100
8434
5307
3127
Crawford County, Indiana
18025
10591
38695
5460
86300
3250
1750
1500
Daviess County, Indiana
18027
32411
47342
12462
1102

5538
125700
4174
2312
1862
Guthrie County, Iowa
19077
10740
51013
5741
103700
3634
1904
1730
Hamilton County, Iowa
19079
15297
49813
7184
93600
4465
2718
1747
Hancock County, Iowa
19081
11092
52981
5288
91600
3744
1807
1937
Hardin County, Iowa
19083
17393
51019
8180
88700
5267
2780
2487
Harrison County, Iowa
19085
14467
53567
6727
106400
4477
2612
1865
Henry County, Iowa
19087
20080
49321
8251
103900
5555
3047
2508
Howard County, Iowa
19089
9494
49869
4366
94600
3077
1648
1429
Humboldt County, Iowa
19091
9674
47252
4671
89800
3027
1498
1529
Ida County, Iowa
19093
7071
46993
3410
80300
2332
1254
1078
Iowa County, Iowa
19095
16344
59375
7255
138700
5322
3077
2245
Jackson County, Iowa
19097
19572
49028
9429
114300
6324
3282
3042
Jasper County, Iowa
19099
36726
55033
16158
117700
10543
6201
4342
Jefferson County, Iowa
19101
17318
42899
7557
101000
4598
2335
2263
Johnson County, Iowa
19103
139436
55700
57997
193600
32973
22845
10128
Jones County, Iowa
19105
20560
55060
8902
129300
6389
3661

1168
455
713
Russell County, Kansas
20167
6988
41102
3892
81000
2428
1210
1218
Saline County, Kansas
20169
55735
47801
24192
122200
14704
9296
5408
Scott County, Kansas
20171
4928
51850
2373
126000
1583
714
869
Sedgwick County, Kansas
20173
506529
50657
213700
126500
123624
80394
43230
Seward County, Kansas
20175
23274
47134
8110
87100
5060
2780
2280
Shawnee County, Kansas
20177
178792
50378
79425
122200
45873
28901
16972
Sheridan County, Kansas
20179
2531
49750
1263
91400
887
276
611
Sherman County, Kansas
20181
6054
39029
3124
77000
1620
833
787
Smith County, Kansas
20183
3740
43848
2240
64000
1338
428
910
Stafford County, Kansas
20185
4320
47377
2303
65100
1503
552
951
Stanton County, Kansas
20187
2149
43780
1014
73800
640
316
324
Stevens County, Kansas
20189
5772
55433
2294
96400
1504
721
783
Sumner County, Kansas
20191
23638
50141
10855
83900
7057
3833
3224
Thomas County, Kansas
20193
7925
48504
3539
105100
2211
1060
1151
Trego County, Kansas
20195
2951
54000
1665
86900
1004
460
5

12447
29736
5596
76600
3326
1635
1691
Pulaski County, Kentucky
21199
63635
34790
31267
106800
17669
9723
7946
Robertson County, Kentucky
21201
2208
31741
1103
104000
735
301
434
Rockcastle County, Kentucky
21203
16942
31555
7686
73900
5016
2291
2725
Rowan County, Kentucky
21205
23608
36860
10107
115300
5302
3014
2288
Russell County, Kentucky
21207
17669
30720
9942
88000
5177
2406
2771
Scott County, Kentucky
21209
50178
63027
20154
164600
13019
9705
3314
Shelby County, Kentucky
21211
44290
60324
16901
175700
10979
7309
3670
Simpson County, Kentucky
21213
17704
39679
7531
116800
4271
2497
1774
Spencer County, Kentucky
21215
17577
63000
6883
176900
5320
3928
1392
Taylor County, Kentucky
21217
24993
33340
10954
93700
6357
3429
2928
Todd County, Kentucky
21219
12524
40497
5292
87300
3287
1531
1756
Trigg County, Kentucky
21221
14250
44083
7835
114100
4946
2653
2293
Trimble County, Kentucky
21223
8783
47409
3923
112000
2686
1588
1098
Union County, Kentucky
21225
15138
40120
6183
84100
4015
22

25464
13139
Middlesex County, Massachusetts
25017
1556116
85118
617089
414600
365501
257747
107754
Nantucket County, Massachusetts
25019
10556
84057
11763
902500
2568
1793
775
Norfolk County, Massachusetts
25021
687721
88262
272397
399500
178492
128876
49616
Plymouth County, Massachusetts
25023
503681
75459
201930
328600
137942
100975
36967
Suffolk County, Massachusetts
25025
758919
55044
321386
377100
105671
79360
26311
Worcester County, Massachusetts
25027
810935
65313
328627
252600
194957
141342
53615
Alcona County, Michigan
26001
10550
38033
11056
97500
4454
2183
2271
Alger County, Michigan
26003
9476
39300
6581
116400
2977
1421
1556
Allegan County, Michigan
26005
112837
54264
49692
140400
33872
20511
13361
Alpena County, Michigan
26007
29068
38829
15983
93300
9693
5430
4263
Antrim County, Michigan
26009
23267
46845
17811
140700
8096
4754
3342
Arenac County, Michigan
26011
15424
38307
9767
87500
5352
2642
2710
Baraga County, Michigan
26013
8690
39803
5235
88100
2498
1043
1455
Barry

42444
52632
19569
160800
12125
7512
4613
Kittson County, Minnesota
27069
4480
52326
2604
72600
1545
590
955
Koochiching County, Minnesota
27071
13054
42919
7877
101400
4572
2414
2158
Lac qui Parle County, Minnesota
27073
7023
49903
3672
83000
2481
1054
1427
Lake County, Minnesota
27075
10750
48417
7715
157700
4062
2181
1881
Lake of the Woods County, Minnesota
27077
3949
42263
3671
117000
1387
798
589
Le Sueur County, Minnesota
27079
27707
60632
12468
176500
8833
5681
3152
Lincoln County, Minnesota
27081
5808
49575
3126
92500
1980
929
1051
Lyon County, Minnesota
27083
25699
51600
11117
133700
6746
3875
2871
McLeod County, Minnesota
27085
36046
56128
15746
148300
11299
7363
3936
Mahnomen County, Minnesota
27087
5496
41118
2782
97300
1468
653
815
Marshall County, Minnesota
27089
9453
54092
4804
93500
3252
1569
1683
Martin County, Minnesota
27091
20350
51391
9970
105400
6513
3584
2929
Meeker County, Minnesota
27093
23129
55042
10688
157200
7350
4512
2838
Mille Lacs County, Minnesota
27095


7239
79700
4908
1784
3124
Stone County, Mississippi
28131
17978
45025
7216
111800
4579
2020
2559
Sunflower County, Mississippi
28133
27911
27384
9702
70000
4904
2506
2398
Tallahatchie County, Mississippi
28135
14959
29731
5530
58100
3283
1525
1758
Tate County, Mississippi
28137
28415
42880
11100
110200
7389
4298
3091
Tippah County, Mississippi
28139
22054
35609
9723
77800
6414
2862
3552
Tishomingo County, Mississippi
28141
19539
35143
10303
76600
5861
2574
3287
Tunica County, Mississippi
28143
10477
31211
4811
80000
1641
836
805
Union County, Mississippi
28145
27811
35865
11686
83200
7340
3493
3847
Walthall County, Mississippi
28147
14978
31384
7147
93200
4917
2050
2867
Warren County, Mississippi
28149
48020
41121
21872
100800
11940
6894
5046
Washington County, Mississippi
28151
49499
29144
21650
73900
10010
4825
5185
Wayne County, Mississippi
28153
20564
32557
9210
74300
6487
2238
4249
Webster County, Mississippi
28155
9999
34448
4806
78500
3058
1366
1692
Wilkinson County, Mississippi

Howard County, Nebraska
31093
6347
50030
2994
109200
2007
1105
902
Jefferson County, Nebraska
31095
7433
43008
3902
71000
2552
1121
1431
Johnson County, Nebraska
31097
5167
45429
2150
76800
1389
646
743
Kearney County, Nebraska
31099
6549
51934
2922
109900
1883
940
943
Keith County, Nebraska
31101
8146
41781
5381
99700
2599
1395
1204
Keya Paha County, Nebraska
31103
711
38625
488
74400
243
60
183
Kimball County, Nebraska
31105
3720
40242
1933
80100
1020
482
538
Knox County, Nebraska
31107
8556
45411
4801
81200
2731
1002
1729
Lancaster County, Nebraska
31109
298080
51830
123773
152900
69852
48753
21099
Lincoln County, Nebraska
31111
35896
50194
16615
114200
9973
5778
4195
Logan County, Nebraska
31113
851
48281
415
114300
252
142
110
Loup County, Nebraska
31115
548
55417
442
96300
199
70
129
McPherson County, Nebraska
31117
433
54306
244
133300
128
45
83
Madison County, Nebraska
31119
35111
50218
15070
118300
9093
5306
3787
Merrick County, Nebraska
31121
7776
51012
3724
85600
2434
1133
1

59675
34273
25402
Onondaga County, New York
36067
468304
55092
203956
135900
120529
79234
41295
Ontario County, New York
36069
109192
57416
49418
145700
32228
20921
11307
Orange County, New York
36071
375384
70848
139103
262500
86184
61469
24715
Orleans County, New York
36073
42204
46359
18498
91300
12112
7415
4697
Oswego County, New York
36075
121183
47860
53656
94800
32675
19565
13110
Otsego County, New York
36077
61399
48588
30665
142800
17352
9249
8103
Putnam County, New York
36079
99488
96148
38289
354900
27946
20098
7848
Queens County, New York
36081
2301139
57720
844189
450300
340561
208490
132071
Rensselaer County, New York
36083
159900
60709
71742
179100
41240
27830
13410
Richmond County, New York
36085
472481
73197
178136
439500
114000
80370
33630
Rockland County, New York
36087
320688
84855
104442
419100
68107
47935
20172
St. Lawrence County, New York
36089
112011
44705
52203
87600
29600
15701
13899
Saratoga County, New York
36091
223774
71496
100953
232900
64423
43151
21272

Rutherford County, North Carolina
37161
66865
35630
33860
106600
18870
10069
8801
Sampson County, North Carolina
37163
63873
35490
27166
87600
16478
8140
8338
Scotland County, North Carolina
37165
35932
30958
15173
79100
8196
4395
3801
Stanly County, North Carolina
37167
60586
40910
27216
128200
16815
9953
6862
Stokes County, North Carolina
37169
46661
40696
21836
117400
14696
8698
5998
Surry County, North Carolina
37171
73170
36164
33565
115500
21094
11186
9908
Swain County, North Carolina
37173
14163
33931
8769
126700
3888
1613
2275
Transylvania County, North Carolina
37175
32928
45114
19310
192000
10476
5062
5414
Tyrrell County, North Carolina
37177
4152
32361
2003
98800
1072
546
526
Union County, North Carolina
37179
213422
65903
75313
197400
56496
43586
12910
Vance County, North Carolina
37181
44829
33316
19990
97900
10411
5847
4564
Wake County, North Carolina
37183
976019
67309
392813
234000
234084
185003
49081
Warren County, North Carolina
37185
20468
34254
11750
96400
5483
2870

Ottawa County, Ohio
39123
41162
53914
27967
138800
13775
8251
5524
Paulding County, Ohio
39125
19165
45550
8692
91900
6010
3467
2543
Perry County, Ohio
39127
36025
42017
15161
91800
10093
6042
4051
Pickaway County, Ohio
39129
56515
57439
21182
147700
14363
9757
4606
Pike County, Ohio
39131
28396
40283
12530
95000
7463
4099
3364
Portage County, Ohio
39133
161897
52552
67897
150900
41928
27522
14406
Preble County, Ohio
39135
41682
47818
17851
112700
12375
7816
4559
Putnam County, Ohio
39137
34184
60524
13768
138900
10766
6112
4654
Richland County, Ohio
39139
122312
41877
54353
102500
33004
19137
13867
Ross County, Ohio
39141
77334
43345
31917
110300
20275
12335
7940
Sandusky County, Ohio
39143
60187
47209
26257
110100
17523
10758
6765
Scioto County, Ohio
39145
78017
35903
34084
90200
20345
10009
10336
Seneca County, Ohio
39147
55929
45444
23959
96900
15358
8939
6419
Shelby County, Ohio
39149
49067
54550
20201
130000
13138
8346
4792
Stark County, Ohio
39151
374979
47137
165462
122900
1036

55936
164853
168200
110223
71966
38257
Blair County, Pennsylvania
42013
126448
43981
56109
110600
36617
19818
16799
Bradford County, Pennsylvania
42015
62228
48987
30102
135900
18168
8711
9457
Bucks County, Pennsylvania
42017
626583
77568
246515
308800
178113
122503
55610
Butler County, Pennsylvania
42019
185689
60934
79529
178100
57196
35116
22080
Cambria County, Pennsylvania
42021
139381
42107
65413
87100
42440
21006
21434
Cameron County, Pennsylvania
42023
4869
39897
4410
70600
1525
742
783
Carbon County, Pennsylvania
42025
64634
49973
34400
144700
20158
12081
8077
Centre County, Pennsylvania
42027
157823
52186
64489
197200
34606
21112
13494
Chester County, Pennsylvania
42029
509797
85976
194892
325800
139807
99596
40211
Clarion County, Pennsylvania
42031
39454
42536
19965
107100
10963
5096
5867
Clearfield County, Pennsylvania
42033
81343
42257
38614
87300
24555
11573
12982
Clinton County, Pennsylvania
42035
39614
45078
19035
117400
10549
5710
4839
Columbia County, Pennsylvania
4203

133600
5398
3008
2390
Day County, South Dakota
46037
5618
39216
3670
81100
1845
634
1211
Deuel County, South Dakota
46039
4341
53152
2223
112000
1575
827
748
Dewey County, South Dakota
46041
5579
37206
1996
60100
959
309
650
Douglas County, South Dakota
46043
2973
52500
1434
73000
1018
380
638
Edmunds County, South Dakota
46045
4018
56000
1911
111200
1297
555
742
Fall River County, South Dakota
46047
6906
45997
4171
102800
2193
928
1265
Faulk County, South Dakota
46049
2359
43679
1180
78600
732
221
511
Grant County, South Dakota
46051
7227
51272
3543
107900
2523
1328
1195
Gregory County, South Dakota
46053
4226
37540
2503
61200
1356
463
893
Haakon County, South Dakota
46055
2083
41518
1033
76800
671
246
425
Hamlin County, South Dakota
46057
5982
58602
2802
112100
1740
969
771
Hand County, South Dakota
46059
3375
47163
1831
96100
1063
403
660
Hanson County, South Dakota
46061
3386
60741
1175
110100
880
406
474
Harding County, South Dakota
46063
1328
52396
682
87200
369
124
245
Hughes Co

4952
6839
Andrews County, Texas
48003
16775
70423
6013
111600
4090
1868
2222
Angelina County, Texas
48005
87748
44223
36070
87100
20178
8691
11487
Aransas County, Texas
48007
24292
41690
15614
144000
7040
2506
4534
Archer County, Texas
48009
8779
60275
4129
107500
2789
1327
1462
Armstrong County, Texas
48011
1943
59737
933
106600
543
238
305
Atascosa County, Texas
48013
47050
52192
17788
91500
11409
5313
6096
Austin County, Texas
48015
28886
53687
12978
157000
8156
3603
4553
Bailey County, Texas
48017
7126
37397
2787
58300
1659
618
1041
Bandera County, Texas
48019
20796
49863
11592
146100
6925
3637
3288
Bastrop County, Texas
48021
76948
54821
29522
129500
19874
11058
8816
Baylor County, Texas
48023
3628
36373
2695
81400
1380
379
1001
Bee County, Texas
48025
32659
42302
10651
72900
5417
2049
3368
Bell County, Texas
48027
326041
50550
131684
127500
60615
38291
22324
Bexar County, Texas
48029
1825502
51150
675208
129400
361851
232549
129302
Blanco County, Texas
48031
10723
55504
5622
1750

49003
50991
55038
17910
167500
12708
8690
4018
Cache County, Utah
49005
117449
50497
38715
191900
23289
15839
7450
Carbon County, Utah
49007
20927
46900
9604
123900
5546
3211
2335
Daggett County, Utah
49009
776
56750
1165
174700
231
100
131
Davis County, Utah
49011
323374
71112
101756
225800
75683
57380
18303
Duchesne County, Utah
49013
19817
61133
9715
172600
4990
2821
2169
Emery County, Utah
49015
10728
49787
4483
132700
2915
1645
1270
Garfield County, Utah
49017
5069
42614
3794
156600
1409
743
666
Grand County, Utah
49019
9388
41312
4995
224800
2583
1280
1303
Iron County, Utah
49021
47139
43855
19984
165900
9597
6556
3041
Juab County, Utah
49023
10400
54761
3538
165200
2552
1631
921
Kane County, Utah
49025
7202
50194
5854
175300
2221
1326
895
Millard County, Utah
49027
12582
51593
4943
138400
3257
1738
1519
Morgan County, Utah
49029
10276
74314
3200
266400
2517
1760
757
Piute County, Utah
49031
1865
35980
916
140800
468
207
261
Rich County, Utah
49033
2292
50781
2876
173500
506
295


5179
2984
2195
Wise County, Virginia
51195
40530
37407
17851
87500
10627
5077
5550
Wythe County, Virginia
51197
29190
41360
14159
123900
8270
4260
4010
York County, Virginia
51199
66471
81749
27150
312600
17512
12661
4851
Alexandria city, Virginia
51510
149315
89134
74317
502500
28410
22824
5586
Bristol city, Virginia
51520
17524
35368
8841
114500
4257
2184
2073
Buena Vista city, Virginia
51530
6666
29097
2920
115000
1605
1005
600
Charlottesville city, Virginia
51540
45084
49775
19886
285300
7735
5319
2416
Chesapeake city, Virginia
51550
230601
68620
86657
253800
57374
45453
11921
Colonial Heights city, Virginia
51570
17515
50304
7808
169300
4461
2824
1637
Covington city, Virginia
51580
5736
34746
3035
68600
1834
951
883
Danville city, Virginia
51590
42450
32315
22376
88600
10013
5307
4706
Emporia city, Virginia
51595
5672
28601
2686
115000
1050
567
483
Fairfax city, Virginia
51600
23402
105297
8780
470300
5925
4510
1415
Falls Church city, Virginia
51610
13308
120522
5601
718900
3031
2

17475
39324
9820
165000
5194
3028
2166
Nicholas County, West Virginia
54067
25930
39171
13031
83100
8540
3509
5031
Ohio County, West Virginia
54069
43637
40569
21097
107500
12583
6041
6542
Pendleton County, West Virginia
54071
7402
36953
5161
100500
2430
816
1614
Pleasants County, West Virginia
54073
7636
44288
3382
100800
2407
1124
1283
Pocahontas County, West Virginia
54075
8697
36827
8841
115500
3013
1274
1739
Preston County, West Virginia
54077
33809
45064
15060
107100
9943
4471
5472
Putnam County, West Virginia
54079
56596
56774
23699
148600
17999
10060
7939
Raleigh County, West Virginia
54081
78493
41032
36000
101700
22779
10924
11855
Randolph County, West Virginia
54083
29365
39457
14173
101200
8435
3936
4499
Ritchie County, West Virginia
54085
10140
37636
5816
73100
3130
1263
1867
Roane County, West Virginia
54087
14636
31813
7379
88900
4486
1829
2657
Summers County, West Virginia
54089
13544
36651
7678
89200
4374
1739
2635
Taylor County, West Virginia
54091
16977
43970
7521
88

We have all of our data that was nested in these tags saved to a Python array. Access the elements of the array by using data[x], where x is location in the array. In Python, arrays start at 0, so place 1 in a Python array is actually called by using a 0, and place 8 would be called by a 7.

In [17]:
print(data[0])
print(data[1])
print(data[0].string)
print(data[1].string)

<td class="name">Autauga County, Alabama</td>
<td class="fips">01001</td>
Autauga County, Alabama
01001


In [18]:
import requests
import bs4

# load and get the website
response = requests.get('http://duspviz.mit.edu/_assets/data/county_housing_stats.html')

# create the soup
soup = bs4.BeautifulSoup(response.text, "html.parser")

# find all the tags with class city or number
data = soup.findAll(attrs={'class':['name','fips','tot-pop','median-income','no-housing-units','med-home-val','owner-occupied','house-w-debt','house-wo-debt']})

# print 'data' to console
print(data)

[<td class="name">Autauga County, Alabama</td>, <td class="fips">01001</td>, <td class="tot-pop">55221</td>, <td class="median-income">51281</td>, <td class="no-housing-units">22582</td>, <td class="med-home-val">141300</td>, <td class="owner-occupied">15077</td>, <td class="house-w-debt">9668</td>, <td class="house-wo-debt">5409</td>, <td class="name">Baldwin County, Alabama</td>, <td class="fips">01003</td>, <td class="tot-pop">195121</td>, <td class="median-income">50254</td>, <td class="no-housing-units">106422</td>, <td class="med-home-val">169300</td>, <td class="owner-occupied">52997</td>, <td class="house-w-debt">31824</td>, <td class="house-wo-debt">21173</td>, <td class="name">Barbour County, Alabama</td>, <td class="fips">01005</td>, <td class="tot-pop">26932</td>, <td class="median-income">32964</td>, <td class="no-housing-units">11810</td>, <td class="med-home-val">92200</td>, <td class="owner-occupied">5864</td>, <td class="house-w-debt">2691</td>, <td class="house-wo-deb

You should see an array with our data elements nested within tags. This is what we want!

In [19]:
f = open('county_data.csv','w') # open new file, make sure path to your data file is correct

p = 0 # initial place in array
l = len(data)-1 # length of array minus one


f.write("County, State, FIPS Code, Total Pop, Median Income ($), No. of Housing Units, Median Home Value ($), No. of Owner Occupied Housing Units, No. of Owner Occ. Housing Units with Debt, No. of Owner Occ. Housing Units without Debt\n") #write headers


while p < l: # while place is less than length
    f.write(data[p].string + ", ") # write county and add comma
    p = p + 1 # increment
    f.write(data[p].string + ", ") # write FIPS and add comma
    p = p + 1 # increment
    f.write(data[p].string + ", ") # write Total Pop and add comma
    p = p + 1 # increment
    f.write(data[p].string + ", ") # write Median Income and add comma
    p = p + 1 # increment
    f.write(data[p].string + ", ") # write No. of Housing Units and add comma
    p = p + 1 # increment
    f.write(data[p].string + ", ") # write Median Home Value and add comma
    p = p + 1 # increment
    f.write(data[p].string + ", ") # write No. of Owner Occupied Housing Units and add comma
    p = p + 1 # increment
    f.write(data[p].string + ", ") # write No. of Owner Occ. Housing Units with Debt and add comma
    p = p + 1 # increment
    f.write(data[p].string + "\n") # write No. of Owner Occ. Housing Units without Debt and line break
    p = p + 1 # increment

    
f.close() # close file


## JSON & Working with Web APIs
Web APIs are a more convenient way for programs to interact with websites. Many webistes now have a nice API that gives access to it's data in JSON format.

In [20]:
import json

a = {'a': 1, 'b':2}
s = json.dumps(a)
a2 = json.loads(s)

In [21]:
print(a) # a dictionary
print(s) # s is a string containing a in JSON encoding
print(a2) # reading back the keys are now in unicode

{'a': 1, 'b': 2}
{"a": 1, "b": 2}
{u'a': 1, u'b': 2}


## World Cup in JSON!

The [2014 FIFA World Cup](http://en.wikipedia.org/wiki/2014_FIFA_World_Cup) was held this summer in Brazil at several different venues.  There was an [API created for the World Cup](http://worldcup.sfg.io) that scraped current match results and output match data as JSON. Possible output includes events such as goals, substitutions, and cards. The [actual matches are listed here](http://worldcup.sfg.io/matches) in JSON. 

* Example from [Fernando Masanori](https://gist.github.com/fmasanori/1288160dad16cc473a53)

In [22]:
import requests

url = "http://worldcup.sfg.io/matches"
resp = requests.get(url)
wc = resp.json()
wc
#wc = json.loads(data.decode('utf-8'))

[{u'attendance': u'45261',
  u'away_team': {u'code': u'KOR',
   u'country': u'Korea Republic',
   u'goals': 0,
   u'penalties': 0},
  u'away_team_country': u'Korea Republic',
  u'away_team_events': [{u'id': 6,
    u'player': u'KANG Yumi',
    u'time': u"52'",
    u'type_of_event': u'substitution-out'},
   {u'id': 7,
    u'player': u'KANG Chaerim',
    u'time': u"52'",
    u'type_of_event': u'substitution-in'},
   {u'id': 8,
    u'player': u'LEE Youngju',
    u'time': u"69'",
    u'type_of_event': u'substitution-out'},
   {u'id': 9,
    u'player': u'LEE Mina',
    u'time': u"69'",
    u'type_of_event': u'substitution-in'},
   {u'id': 17,
    u'player': u'JUNG Seolbin',
    u'time': u"86'",
    u'type_of_event': u'substitution-out'},
   {u'id': 18,
    u'player': u'YEO Minji',
    u'time': u"86'",
    u'type_of_event': u'substitution-in'}],
  u'away_team_statistics': {u'attempts_on_goal': 4,
   u'ball_possession': 40,
   u'blocked': 1,
   u'clearances': 17,
   u'corners': 1,
   u'country

In [23]:
"Number of matches in 2014 World Cup: %i" % len(wc)

'Number of matches in 2014 World Cup: 52'

In [24]:
# Print keys in first match
gameIndex = 0
wc[gameIndex].keys()

[u'datetime',
 u'officials',
 u'home_team_country',
 u'away_team_events',
 u'attendance',
 u'winner_code',
 u'winner',
 u'away_team_statistics',
 u'location',
 u'weather',
 u'status',
 u'stage_name',
 u'last_event_update_at',
 u'away_team',
 u'home_team_events',
 u'home_team_statistics',
 u'home_team',
 u'last_score_update_at',
 u'away_team_country',
 u'venue',
 u'fifa_id',
 u'time']

In [25]:
wc[gameIndex]['status']

u'completed'

In [26]:
wc[gameIndex]['home_team']

{u'code': u'FRA', u'country': u'France', u'goals': 4, u'penalties': 0}

In [27]:
for elem in wc:
    print(elem['home_team']['country'], elem['home_team']['goals'], elem['away_team']['country'], elem['away_team']['goals'])

(u'France', 4, u'Korea Republic', 0)
(u'Germany', 1, u'China PR', 0)
(u'Spain', 3, u'South Africa', 1)
(u'Norway', 3, u'Nigeria', 0)
(u'Brazil', 3, u'Jamaica', 0)
(u'England', 2, u'Scotland', 1)
(u'Australia', 1, u'Italy', 2)
(u'Argentina', 0, u'Japan', 0)
(u'Canada', 1, u'Cameroon', 0)
(u'New Zealand', 0, u'Netherlands', 1)
(u'Chile', 0, u'Sweden', 2)
(u'USA', 13, u'Thailand', 0)
(u'Nigeria', 2, u'Korea Republic', 0)
(u'Germany', 1, u'Spain', 0)
(u'France', 2, u'Norway', 1)
(u'Australia', 3, u'Brazil', 2)
(u'South Africa', 0, u'China PR', 1)
(u'Japan', 2, u'Scotland', 1)
(u'Jamaica', 0, u'Italy', 5)
(u'England', 1, u'Argentina', 0)
(u'Netherlands', 3, u'Cameroon', 1)
(u'Canada', 2, u'New Zealand', 0)
(u'Sweden', 5, u'Thailand', 1)
(u'USA', 3, u'Chile', 0)
(u'China PR', 0, u'Spain', 0)
(u'South Africa', 0, u'Germany', 4)
(u'Nigeria', 0, u'France', 1)
(u'Korea Republic', 1, u'Norway', 2)
(u'Italy', 0, u'Brazil', 1)
(u'Jamaica', 1, u'Australia', 4)
(u'Japan', 0, u'England', 2)
(u'Scotlan

### Create a pandas DataFrame from JSON

In [28]:
data = pd.DataFrame(wc, columns = ['match_number', 'location', 'datetime', 'home_team', 'away_team', 'winner', 'home_team_events', 'away_team_events'])
data.head()

Unnamed: 0,match_number,location,datetime,home_team,away_team,winner,home_team_events,away_team_events
0,,Parc des Princes,2019-06-07T19:00:00Z,"{u'country': u'France', u'penalties': 0, u'cod...","{u'country': u'Korea Republic', u'penalties': ...",France,"[{u'type_of_event': u'goal', u'player': u'Euge...","[{u'type_of_event': u'substitution-out', u'pla..."
1,,Roazhon Park,2019-06-08T13:00:00Z,"{u'country': u'Germany', u'penalties': 0, u'co...","{u'country': u'China PR', u'penalties': 0, u'c...",Germany,"[{u'type_of_event': u'substitution-out', u'pla...","[{u'type_of_event': u'yellow-card', u'player':..."
2,,Stade Océane,2019-06-08T16:00:00Z,"{u'country': u'Spain', u'penalties': 0, u'code...","{u'country': u'South Africa', u'penalties': 0,...",Spain,"[{u'type_of_event': u'substitution-out', u'pla...","[{u'type_of_event': u'goal', u'player': u'Them..."
3,,Stade Auguste-Delaune,2019-06-08T19:00:00Z,"{u'country': u'Norway', u'penalties': 0, u'cod...","{u'country': u'Nigeria', u'penalties': 0, u'co...",Norway,"[{u'type_of_event': u'goal', u'player': u'Guro...","[{u'type_of_event': u'yellow-card', u'player':..."
4,,Stade des Alpes,2019-06-09T13:30:00Z,"{u'country': u'Brazil', u'penalties': 0, u'cod...","{u'country': u'Jamaica', u'penalties': 0, u'co...",Brazil,"[{u'type_of_event': u'goal', u'player': u'CRIS...","[{u'type_of_event': u'yellow-card', u'player':..."


In [29]:
data['gameDate'] = pd.DatetimeIndex(data.datetime).date
data['gameTime'] = pd.DatetimeIndex(data.datetime).time

In [30]:
data.head()

Unnamed: 0,match_number,location,datetime,home_team,away_team,winner,home_team_events,away_team_events,gameDate,gameTime
0,,Parc des Princes,2019-06-07T19:00:00Z,"{u'country': u'France', u'penalties': 0, u'cod...","{u'country': u'Korea Republic', u'penalties': ...",France,"[{u'type_of_event': u'goal', u'player': u'Euge...","[{u'type_of_event': u'substitution-out', u'pla...",2019-06-07,19:00:00
1,,Roazhon Park,2019-06-08T13:00:00Z,"{u'country': u'Germany', u'penalties': 0, u'co...","{u'country': u'China PR', u'penalties': 0, u'c...",Germany,"[{u'type_of_event': u'substitution-out', u'pla...","[{u'type_of_event': u'yellow-card', u'player':...",2019-06-08,13:00:00
2,,Stade Océane,2019-06-08T16:00:00Z,"{u'country': u'Spain', u'penalties': 0, u'code...","{u'country': u'South Africa', u'penalties': 0,...",Spain,"[{u'type_of_event': u'substitution-out', u'pla...","[{u'type_of_event': u'goal', u'player': u'Them...",2019-06-08,16:00:00
3,,Stade Auguste-Delaune,2019-06-08T19:00:00Z,"{u'country': u'Norway', u'penalties': 0, u'cod...","{u'country': u'Nigeria', u'penalties': 0, u'co...",Norway,"[{u'type_of_event': u'goal', u'player': u'Guro...","[{u'type_of_event': u'yellow-card', u'player':...",2019-06-08,19:00:00
4,,Stade des Alpes,2019-06-09T13:30:00Z,"{u'country': u'Brazil', u'penalties': 0, u'cod...","{u'country': u'Jamaica', u'penalties': 0, u'co...",Brazil,"[{u'type_of_event': u'goal', u'player': u'CRIS...","[{u'type_of_event': u'yellow-card', u'player':...",2019-06-09,13:30:00
