# Introduction to the Pandas Library 

# What is Pandas and why do we need it?
Pandas is a Python library which offers us a set of built in functions and data structures enabling data analysis and manipulation.

Up to this point, you have learned a number of different data structures. But if you wanted to analyze a dataset that has 100,000 rows of data, storing it in a list, dictionary, or some other basic Python data structure would not be practical.

```
[
  ['TSLA', 483.12, '10-4-2020'],
  ['GOOG', 583.65, '5-26-2020'],
  ['APPL', 802.38, '1-2-2020'],
  ['MSFT', 113.62, '2-6-2020'],
  ['AMZN', 987.96, '3-30-2020'],
  ['TWTR', 53.12, '6-1-2020'],
  ['GOOG', 303.71, '2-14-2020'],
  ['TSLA', 283.65, '3-26-2020'],
  ['AMZN', 583.65, '7-27-2020']
]
```
For example, if you wanted to go through a dataset stored in a list of lists (in the format included above) that included stocks, their prices at multiple dates, and the date at which they were recorded, and then find the average stock price for a given stock, you would have to write the following:

1. Iterate through your list and select every instance where the stock value matches your search string

2. Store those lists in a separate data strucutre

3. Index THOSE lists and select the stock price, and then store those prices in yet another data structure 

4. Write a function to iterate over your list with the prices, return the value, and then map it back to the stock we were initially searching for!


This would be a ton of code, with many places at which an error could pop up and trip us up.


**In Pandas, you can do this in one line of code:**

```
dataframe.groupby('stock_name').mean()
```

You will likely notice that Pandas recreates a lot of functionality you have already learned. The reason why we are okay with using Pandas' approach is because it offers us powerful and easy-to-use tools for manipulating data.

## Upload this file to Colab

- `gdpdata.csv`

### Open this file in Excel and review the data.

In [1]:
from google.colab import files
uploaded = files.upload()

Saving gdpdata.csv to gdpdata.csv


## Read the CSV into Python.

- This is a completely different mechanism than what we've done up to now.
- NOTE.  We're NOT using the `csv` module and `csv.DictReader()` to bring the data into Python.

Instead, we need to `import pandas` and then use 
`read_csv()`.

What data is being created?

In [2]:
# The code in this cell is boilerplate and is just used to import the data and required libraries. 

import pandas as pd

df = pd.read_csv("gdpdata.csv")

print("done")

done


## DataFrames

The core data structure in Pandas is the DataFrame.

The easiest way to think about DataFrames is that they are like spreadsheets. They have column headers specifying a variable, and then rows which represent datapoints.

We will now learn how to explore our data using DataFrames, and in turn we will learn more about what DataFrames are and how we can use them.

# Code-along

When you first instantiate a DataFrame, it is often a good idea to get a sense for the data before you proceed into any more serious analytical work.

One recommended process goes as follows:
1. Check the shape
2. Check the columns
3. Peak at the head
4. Get summary stats for each of the metrics
5. Clean the dataset

In [3]:
# (1) Check the shape
df.shape

(264, 64)

The output from the shape property is a tuple containing the number of rows, followed by the number of columns.
This example data contains 264 rows of data, and 64 columns.

In [4]:
# (2) Check the columns
df.columns

Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
       '1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968',
       '1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977',
       '1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986',
       '1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995',
       '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004',
       '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013',
       '2014', '2015', '2016', '2017', '2018', '2019'],
      dtype='object')

The output of the columns property is a list of the columns in our DataFrame. As we can see, there are four attributes given to each country followed by a column for each year from 1960 to 2019. Note the type of each column header - this will be important for later if we want to perform manipulations by selecting column names.

In [5]:
# (3) Peak at the head
df.head(5)

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
0,Aruba,ABW,GDP (current US$),NY.GDP.MKTP.CD,,,,,,,,,,,,,,,,,,,,,,,,,,,405463400.0,487602500.0,596423600.0,695304400.0,764887100.0,872138700.0,958463200.0,1082980000.0,1245688000.0,1320475000.0,1379961000.0,1531944000.0,1665101000.0,1722799000.0,1873453000.0,1920112000.0,1941341000.0,2021229000.0,2228492000.0,2330726000.0,2424581000.0,2615084000.0,2745251000.0,2498883000.0,2390503000.0,2549721000.0,2534637000.0,2701676000.0,2765363000.0,2919553000.0,2965922000.0,3056425000.0,,
1,Afghanistan,AFG,GDP (current US$),NY.GDP.MKTP.CD,537777811.1,548888895.6,546666677.8,751111191.1,800000044.4,1006667000.0,1400000000.0,1673333000.0,1373333000.0,1408889000.0,1748887000.0,1831109000.0,1595555000.0,1733333000.0,2155555000.0,2366667000.0,2555556000.0,2953333000.0,3300000000.0,3697940000.0,3641723000.0,3478788000.0,,,,,,,,,,,,,,,,,,,,,4055180000.0,4515559000.0,5226779000.0,6209138000.0,6971286000.0,9747880000.0,10109230000.0,12439090000.0,15856570000.0,17804290000.0,20001600000.0,20561070000.0,20484890000.0,19907110000.0,19362640000.0,20191760000.0,19484380000.0,19101350000.0
2,Angola,AGO,GDP (current US$),NY.GDP.MKTP.CD,,,,,,,,,,,,,,,,,,,,,5930503000.0,5550483000.0,5550483000.0,5784342000.0,6131475000.0,7553560000.0,7072063000.0,8083872000.0,8769251000.0,10201100000.0,11228760000.0,10603780000.0,8307811000.0,5768720000.0,4438321000.0,5538749000.0,7526447000.0,7648377000.0,6506230000.0,6152923000.0,9129595000.0,8936064000.0,15285590000.0,17812710000.0,23552050000.0,36970920000.0,52381010000.0,65266450000.0,88538610000.0,70307160000.0,83799500000.0,111789700000.0,128052900000.0,136709900000.0,145712200000.0,116193600000.0,101123900000.0,122123800000.0,101353200000.0,94635420000.0
3,Albania,ALB,GDP (current US$),NY.GDP.MKTP.CD,,,,,,,,,,,,,,,,,,,,,,,,,1857338000.0,1897050000.0,2097326000.0,2080796000.0,2051236000.0,2253090000.0,2028554000.0,1099559000.0,652175000.0,1185315000.0,1880952000.0,2392765000.0,3199643000.0,2258516000.0,2545967000.0,3212119000.0,3480355000.0,3922099000.0,4348070000.0,5611492000.0,7184681000.0,8052076000.0,8896074000.0,10677320000.0,12881350000.0,12044220000.0,11926930000.0,12890770000.0,12319830000.0,12776220000.0,13228140000.0,11386850000.0,11861200000.0,13019690000.0,15147020000.0,15278080000.0
4,Andorra,AND,GDP (current US$),NY.GDP.MKTP.CD,,,,,,,,,,,78619210.0,89409820.0,113408200.0,150820100.0,186558700.0,220127200.0,227281000.0,254020200.0,308008900.0,411578300.0,446416100.0,388958700.0,375896000.0,327861800.0,330070700.0,346738000.0,482000600.0,611316400.0,721425900.0,795449300.0,1029048000.0,1106929000.0,1210014000.0,1007026000.0,1017549000.0,1178739000.0,1223945000.0,1180597000.0,1211932000.0,1239876000.0,1429049000.0,1546926000.0,1755910000.0,2361727000.0,2894922000.0,3159905000.0,3456442000.0,3952601000.0,4085631000.0,3674410000.0,3449967000.0,3629204000.0,3188809000.0,3193704000.0,3271808000.0,2789870000.0,2896679000.0,3000181000.0,3218316000.0,3154058000.0


The `head(n)` method takes the number of rows you want to see as an argument, and returns the first `n` rows of the DataFrame. 

What sticks out about this dataset from first looking at the head?

- There are a significant number of `NaNs`
- There appear to be redundant or possibly not useful columns
- The data appears to be sorted alphabetically by country name

In [6]:
# (4) Get summary stats for each of the metrics
df.describe()

Unnamed: 0,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
count,131.0,132.0,135.0,135.0,135.0,144.0,148.0,151.0,155.0,155.0,167.0,168.0,168.0,168.0,169.0,172.0,173.0,176.0,175.0,176.0,187.0,190.0,191.0,192.0,193.0,195.0,197.0,201.0,206.0,206.0,223.0,222.0,223.0,228.0,230.0,236.0,236.0,236.0,238.0,239.0,245.0,245.0,250.0,250.0,251.0,251.0,252.0,252.0,251.0,251.0,252.0,253.0,252.0,252.0,251.0,250.0,249.0,249.0,241.0,223.0
mean,71062710000.0,72485010000.0,76027240000.0,81886470000.0,89912730000.0,92036210000.0,105099900000.0,109193400000.0,114747700000.0,126926700000.0,130782500000.0,143437700000.0,165310800000.0,203619400000.0,236340300000.0,259119400000.0,278876800000.0,309628200000.0,363924400000.0,423159100000.0,452154300000.0,461902300000.0,453287200000.0,453498000000.0,463581200000.0,481461300000.0,558101800000.0,619214900000.0,684076400000.0,711759600000.0,749295000000.0,795670300000.0,836578200000.0,832163800000.0,884919800000.0,962560400000.0,991423400000.0,990166100000.0,979697600000.0,1001606000000.0,1008856000000.0,1006928000000.0,1025444000000.0,1157833000000.0,1310268000000.0,1435580000000.0,1570496000000.0,1799797000000.0,2017206000000.0,1915261000000.0,2117374000000.0,2368250000000.0,2444180000000.0,2533431000000.0,2619317000000.0,2475339000000.0,2514951000000.0,2693083000000.0,2947814000000.0,3237115000000.0
std,212839600000.0,221342500000.0,235620900000.0,253375100000.0,277056000000.0,292632000000.0,320976100000.0,339685500000.0,362880600000.0,399471100000.0,421433700000.0,465318900000.0,538028800000.0,653922600000.0,741441200000.0,822208300000.0,889241700000.0,996463500000.0,1187582000000.0,1374255000000.0,1495071000000.0,1514566000000.0,1497098000000.0,1531944000000.0,1590154000000.0,1665035000000.0,1982542000000.0,2255723000000.0,2499220000000.0,2609341000000.0,2837662000000.0,3005626000000.0,3207558000000.0,3204355000000.0,3420071000000.0,3752874000000.0,3809506000000.0,3766889000000.0,3766834000000.0,3915365000000.0,3964425000000.0,3942731000000.0,4067469000000.0,4565973000000.0,5106077000000.0,5469270000000.0,5858319000000.0,6547161000000.0,7136287000000.0,6748604000000.0,7297244000000.0,8061724000000.0,8240523000000.0,8484106000000.0,8736171000000.0,8296085000000.0,8442963000000.0,8983625000000.0,9713264000000.0,10221860000000.0
min,12012030.0,11592020.0,9122751.0,10840100.0,12712470.0,13593930.0,14469080.0,15835180.0,14600000.0,15850000.0,14295280.0,15278630.0,18936530.0,24196020.0,31514860.0,32506740.0,30036420.0,34139390.0,41567470.0,42620170.0,38715550.0,31020000.0,34918000.0,37837840.0,41246160.0,32125150.0,32085560.0,33608740.0,42972110.0,41119720.0,8824448.0,9365166.0,9742949.0,9630763.0,10886830.0,11025950.0,12334850.0,12700910.0,12757630.0,13687140.0,13742060.0,13196540.0,15450990.0,18231080.0,21534930.0,21839100.0,22902860.0,27030370.0,30290220.0,27101080.0,31823520.0,38711810.0,37671770.0,37509080.0,37290610.0,35492070.0,36547800.0,40619250.0,42588160.0,47271460.0
25%,507924100.0,489190200.0,505458700.0,514025000.0,562697400.0,586371600.0,642702600.0,626490900.0,644007100.0,683482000.0,566517100.0,641778600.0,589531900.0,776108500.0,1073577000.0,1133206000.0,1139394000.0,1110879000.0,1427121000.0,1578531000.0,1394526000.0,1439725000.0,1370498000.0,1326363000.0,1459880000.0,1473906000.0,1776842000.0,2041538000.0,2226711000.0,2254301000.0,2660048000.0,2731151000.0,2691025000.0,2564262000.0,2525683000.0,2703252000.0,3048080000.0,3201824000.0,3123006000.0,3344626000.0,2956746000.0,2833443000.0,3004632000.0,3402464000.0,3828000000.0,4406839000.0,4703042000.0,5647488000.0,6293209000.0,6012022000.0,6933109000.0,7625069000.0,8045091000.0,8475925000.0,9311996000.0,8742043000.0,8734162000.0,9669760000.0,12207120000.0,13968380000.0
50%,2760747000.0,2966849000.0,2814319000.0,3540403000.0,3405333000.0,2952341000.0,3157019000.0,3370843000.0,3909781000.0,4460700000.0,4401259000.0,4776612000.0,5941910000.0,7246266000.0,9388664000.0,9771594000.0,10117110000.0,11155850000.0,13281770000.0,15738810000.0,14394930000.0,14532770000.0,16084250000.0,13629090000.0,11594000000.0,12403730000.0,10621160000.0,11356220000.0,11701110000.0,10765780000.0,12308620000.0,11906800000.0,12452280000.0,13219940000.0,12951050000.0,13901210000.0,14352870000.0,15421890000.0,15211430000.0,15710150000.0,13760510000.0,13581640000.0,14436300000.0,17421260000.0,20662530000.0,21497340000.0,24214120000.0,32154220000.0,35895150000.0,37021510000.0,40811250000.0,43466130000.0,46526790000.0,52274310000.0,55348000000.0,50285040000.0,51596970000.0,54726600000.0,59596890000.0,76085850000.0
75%,29853720000.0,30981330000.0,31924560000.0,36814300000.0,35038460000.0,26538800000.0,31754820000.0,31119480000.0,32554200000.0,36363010000.0,39578860000.0,44721720000.0,52299690000.0,63921020000.0,87243410000.0,97496120000.0,94947150000.0,102883500000.0,115169600000.0,135866600000.0,145933500000.0,164390000000.0,154351900000.0,134312900000.0,119624900000.0,119227400000.0,131659300000.0,147540700000.0,175680500000.0,164928600000.0,173174600000.0,185738500000.0,183150600000.0,189166400000.0,176126900000.0,188367600000.0,199572900000.0,206147800000.0,204498400000.0,193492800000.0,196799800000.0,197337900000.0,195626500000.0,226167900000.0,257789500000.0,307504000000.0,344954700000.0,404694000000.0,486239300000.0,432817400000.0,482303000000.0,530163300000.0,547607700000.0,531182000000.0,556944000000.0,515619500000.0,515654700000.0,541018700000.0,555455400000.0,776116700000.0
max,1369434000000.0,1425105000000.0,1530058000000.0,1648294000000.0,1805661000000.0,1966264000000.0,2133331000000.0,2270936000000.0,2451432000000.0,2704633000000.0,2960864000000.0,3273287000000.0,3777533000000.0,4609093000000.0,5315855000000.0,5920221000000.0,6438082000000.0,7277443000000.0,8584497000000.0,9971137000000.0,11227550000000.0,11623790000000.0,11514480000000.0,11747030000000.0,12179890000000.0,12793340000000.0,15118510000000.0,17200990000000.0,19244140000000.0,20087430000000.0,22626370000000.0,23966560000000.0,25452880000000.0,25857860000000.0,27770700000000.0,30886560000000.0,31572630000000.0,31458070000000.0,31393290000000.0,32561770000000.0,33618620000000.0,33426580000000.0,34709810000000.0,38944810000000.0,43867140000000.0,47517230000000.0,51502020000000.0,58031540000000.0,63675550000000.0,60395540000000.0,66113120000000.0,73448340000000.0,75146000000000.0,77302020000000.0,79450810000000.0,75198760000000.0,76335800000000.0,81229180000000.0,86357070000000.0,87697520000000.0


The `describe()` method outputs summary statistics for each row where applicable - giving us a high level understanding of our data. 

What patterns do you notice in the data?

In [7]:
# (5) Clean the dataset

# The process of cleaning data is not as straightforward as the other concepts touched on so far. This module
# will show two simple approaches to dealing with NaNs, though there are other approaches for more advanced 
# programmers that you will be able to tap into once you learn the basics.

# dropna() can be applied to a dataframe and will remove all the rows containing a NaN value. 
# Run this cell to see the output

df.dropna()

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
11,Australia,AUS,GDP (current US$),NY.GDP.MKTP.CD,1.857767e+10,1.965282e+10,1.989249e+10,2.150745e+10,2.376414e+10,2.593684e+10,2.726845e+10,3.039758e+10,3.266547e+10,3.662896e+10,4.127114e+10,4.514951e+10,5.196729e+10,6.373735e+10,8.883125e+10,9.717056e+10,1.049212e+11,1.102019e+11,1.183386e+11,1.347120e+11,1.497749e+11,1.766423e+11,1.937703e+11,1.770304e+11,1.932422e+11,1.802347e+11,1.820369e+11,1.890603e+11,2.356592e+11,2.992680e+11,3.107772e+11,3.253104e+11,3.248789e+11,3.115444e+11,3.222117e+11,3.672164e+11,4.003027e+11,4.345680e+11,3.988991e+11,3.886082e+11,4.152226e+11,3.783761e+11,3.946489e+11,4.664881e+11,6.124904e+11,6.934078e+11,7.460542e+11,8.530996e+11,1.053996e+12,9.278052e+11,1.146138e+12,1.396650e+12,1.546152e+12,1.576184e+12,1.467484e+12,1.351694e+12,1.208847e+12,1.330136e+12,1.433904e+12,1.392681e+12
12,Austria,AUT,GDP (current US$),NY.GDP.MKTP.CD,6.592694e+09,7.311750e+09,7.756110e+09,8.374175e+09,9.169984e+09,9.994071e+09,1.088768e+10,1.157943e+10,1.244063e+10,1.358280e+10,1.537301e+10,1.785849e+10,2.205961e+10,2.951547e+10,3.518930e+10,4.005921e+10,4.295998e+10,5.154576e+10,6.205226e+10,7.393730e+10,8.205891e+10,7.103423e+10,7.127529e+10,7.212102e+10,6.798535e+10,6.938677e+10,9.903617e+10,1.241684e+11,1.333394e+11,1.331058e+11,1.664634e+11,1.737942e+11,1.950781e+11,1.903797e+11,2.035352e+11,2.410383e+11,2.372509e+11,2.127903e+11,2.182599e+11,2.171858e+11,1.967998e+11,1.973379e+11,2.133778e+11,2.616958e+11,3.009042e+11,3.159744e+11,3.359986e+11,3.886914e+11,4.302943e+11,4.001723e+11,3.918927e+11,4.311203e+11,4.094252e+11,4.300687e+11,4.419961e+11,3.818176e+11,3.952277e+11,4.183162e+11,4.555083e+11,4.463147e+11
14,Burundi,BDI,GDP (current US$),NY.GDP.MKTP.CD,1.960000e+08,2.030000e+08,2.135000e+08,2.327500e+08,2.607500e+08,1.589950e+08,1.654446e+08,1.782971e+08,1.832000e+08,1.902057e+08,2.427326e+08,2.528423e+08,2.468046e+08,3.043398e+08,3.452635e+08,4.209867e+08,4.484128e+08,5.475356e+08,6.102256e+08,7.824967e+08,9.197267e+08,9.690467e+08,1.013222e+09,1.082926e+09,9.871439e+08,1.149979e+09,1.201725e+09,1.131466e+09,1.082403e+09,1.113924e+09,1.132101e+09,1.167398e+09,1.083038e+09,9.386326e+08,9.250306e+08,1.000428e+09,8.690339e+08,9.728963e+08,8.937708e+08,8.080772e+08,8.704861e+08,8.767947e+08,8.253945e+08,7.846544e+08,9.152573e+08,1.117113e+09,1.273375e+09,1.356199e+09,1.611836e+09,1.781455e+09,2.032135e+09,2.235821e+09,2.333308e+09,2.451625e+09,2.705783e+09,3.104395e+09,2.959641e+09,3.172292e+09,3.036932e+09,3.012335e+09
15,Belgium,BEL,GDP (current US$),NY.GDP.MKTP.CD,1.165872e+10,1.240015e+10,1.326402e+10,1.426002e+10,1.596011e+10,1.737146e+10,1.865188e+10,1.999204e+10,2.137635e+10,2.371074e+10,2.670620e+10,2.982166e+10,3.720942e+10,4.774380e+10,5.603308e+10,6.567819e+10,7.111388e+10,8.283991e+10,1.012465e+11,1.163155e+11,1.268293e+11,1.047300e+11,9.209593e+10,8.718424e+10,8.334953e+10,8.626826e+10,1.200188e+11,1.493944e+11,1.622991e+11,1.642211e+11,2.053317e+11,2.105110e+11,2.347817e+11,2.247218e+11,2.448841e+11,2.880256e+11,2.792014e+11,2.527081e+11,2.585283e+11,2.581585e+11,2.362045e+11,2.365413e+11,2.571578e+11,3.173817e+11,3.685370e+11,3.855709e+11,4.079181e+11,4.703243e+11,5.152235e+11,4.813459e+11,4.809516e+11,5.226455e+11,4.961813e+11,5.216427e+11,5.346781e+11,4.621497e+11,4.759009e+11,5.037888e+11,5.426859e+11,5.296067e+11
16,Benin,BEN,GDP (current US$),NY.GDP.MKTP.CD,2.261956e+08,2.356682e+08,2.364349e+08,2.539276e+08,2.698190e+08,2.899087e+08,3.029253e+08,3.062220e+08,3.263231e+08,3.307482e+08,3.336278e+08,3.350730e+08,4.103319e+08,5.043760e+08,5.546548e+08,6.768701e+08,6.984082e+08,7.500497e+08,9.288433e+08,1.186231e+09,1.405252e+09,1.291120e+09,1.267778e+09,1.095348e+09,1.051134e+09,1.045713e+09,1.336102e+09,1.562412e+09,1.620246e+09,1.502294e+09,1.959965e+09,1.986438e+09,1.695315e+09,2.274558e+09,1.598076e+09,2.169627e+09,2.361117e+09,2.268302e+09,2.455093e+09,3.676046e+09,3.511249e+09,3.663018e+09,4.174635e+09,5.337267e+09,6.179176e+09,6.565043e+09,7.027863e+09,8.158258e+09,9.748277e+09,9.699587e+09,9.535344e+09,1.069332e+10,1.114136e+10,1.251785e+10,1.328453e+10,1.138816e+10,1.182107e+10,1.270166e+10,1.425099e+10,1.439071e+10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
251,St. Vincent and the Grenadines,VCT,GDP (current US$),NY.GDP.MKTP.CD,1.306656e+07,1.399988e+07,1.452488e+07,1.370822e+07,1.475821e+07,1.510821e+07,1.609987e+07,1.583518e+07,1.535000e+07,1.665000e+07,1.845000e+07,2.005165e+07,2.758549e+07,3.016537e+07,3.292422e+07,3.323716e+07,3.279248e+07,4.935316e+07,6.084477e+07,7.109636e+07,8.234034e+07,1.020865e+08,1.137592e+08,1.222553e+08,1.350250e+08,1.456417e+08,1.608467e+08,1.755806e+08,2.007267e+08,2.147450e+08,2.403653e+08,2.548296e+08,2.779541e+08,2.863078e+08,2.894385e+08,3.160085e+08,3.314897e+08,3.477700e+08,3.736199e+08,3.907191e+08,3.962614e+08,4.300393e+08,4.618834e+08,4.818063e+08,5.219751e+08,5.507287e+08,6.109300e+08,6.844463e+08,6.954289e+08,6.749225e+08,6.812259e+08,6.761296e+08,6.929333e+08,7.212074e+08,7.277148e+08,7.554000e+08,7.744296e+08,7.921778e+08,8.113000e+08,8.253852e+08
257,World,WLD,GDP (current US$),NY.GDP.MKTP.CD,1.369434e+12,1.425105e+12,1.530058e+12,1.648294e+12,1.805661e+12,1.966264e+12,2.133331e+12,2.270936e+12,2.451432e+12,2.704633e+12,2.960864e+12,3.273287e+12,3.777533e+12,4.609093e+12,5.315855e+12,5.920221e+12,6.438082e+12,7.277443e+12,8.584497e+12,9.971137e+12,1.122755e+13,1.162379e+13,1.151448e+13,1.174703e+13,1.217989e+13,1.279334e+13,1.511851e+13,1.720099e+13,1.924414e+13,2.008743e+13,2.262637e+13,2.396656e+13,2.545288e+13,2.585786e+13,2.777070e+13,3.088656e+13,3.157263e+13,3.145807e+13,3.139329e+13,3.256177e+13,3.361862e+13,3.342658e+13,3.470981e+13,3.894481e+13,4.386714e+13,4.751723e+13,5.150202e+13,5.803154e+13,6.367555e+13,6.039554e+13,6.611312e+13,7.344834e+13,7.514600e+13,7.730202e+13,7.945081e+13,7.519876e+13,7.633580e+13,8.122918e+13,8.635707e+13,8.769752e+13
261,South Africa,ZAF,GDP (current US$),NY.GDP.MKTP.CD,7.575397e+09,7.972997e+09,8.497997e+09,9.423396e+09,1.037400e+10,1.133440e+10,1.235500e+10,1.377739e+10,1.489459e+10,1.678039e+10,1.841839e+10,2.033369e+10,2.135744e+10,2.929567e+10,3.680772e+10,3.811454e+10,3.660335e+10,4.065135e+10,4.673945e+10,5.764572e+10,8.298048e+10,8.545442e+10,7.842306e+10,8.741585e+10,7.734409e+10,5.908264e+10,6.752160e+10,8.857370e+10,9.517664e+10,9.903086e+10,1.155523e+11,1.239428e+11,1.345446e+11,1.343081e+11,1.397525e+11,1.554609e+11,1.476063e+11,1.525874e+11,1.377748e+11,1.366323e+11,1.363613e+11,1.215147e+11,1.154824e+11,1.752569e+11,2.285900e+11,2.577727e+11,2.716385e+11,2.994155e+11,2.867698e+11,2.959365e+11,3.753494e+11,4.164189e+11,3.963327e+11,3.668294e+11,3.509046e+11,3.176205e+11,2.963573e+11,3.495541e+11,3.682889e+11,3.514316e+11
262,Zambia,ZMB,GDP (current US$),NY.GDP.MKTP.CD,7.130000e+08,6.962857e+08,6.931429e+08,7.187143e+08,8.394286e+08,1.082857e+09,1.264286e+09,1.368000e+09,1.605857e+09,1.965714e+09,1.825286e+09,1.687000e+09,1.910714e+09,2.268714e+09,3.121833e+09,2.618667e+09,2.746714e+09,2.483000e+09,2.813375e+09,3.325500e+09,3.829500e+09,3.872667e+09,3.994778e+09,3.216308e+09,2.739444e+09,2.281258e+09,1.661949e+09,2.269895e+09,3.713614e+09,3.998638e+09,3.285217e+09,3.378882e+09,3.181922e+09,3.273238e+09,3.656648e+09,3.807067e+09,3.597221e+09,4.303282e+09,3.537683e+09,3.404312e+09,3.600683e+09,4.094481e+09,4.193846e+09,4.901840e+09,6.221078e+09,8.331870e+09,1.275686e+10,1.405696e+10,1.791086e+10,1.532834e+10,2.026556e+10,2.345952e+10,2.550306e+10,2.804551e+10,2.715065e+10,2.124335e+10,2.095476e+10,2.586814e+10,2.700524e+10,2.306472e+10


In [9]:
# fillna() can be applied to a DataFrame and will replace all NaN values with whatever argument you pass into the fcn
# Run this cell to see the output

df.fillna(0)

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
0,Aruba,ABW,GDP (current US$),NY.GDP.MKTP.CD,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,4.054634e+08,4.876025e+08,5.964236e+08,6.953044e+08,7.648871e+08,8.721387e+08,9.584632e+08,1.082980e+09,1.245688e+09,1.320475e+09,1.379961e+09,1.531944e+09,1.665101e+09,1.722799e+09,1.873453e+09,1.920112e+09,1.941341e+09,2.021229e+09,2.228492e+09,2.330726e+09,2.424581e+09,2.615084e+09,2.745251e+09,2.498883e+09,2.390503e+09,2.549721e+09,2.534637e+09,2.701676e+09,2.765363e+09,2.919553e+09,2.965922e+09,3.056425e+09,0.000000e+00,0.000000e+00
1,Afghanistan,AFG,GDP (current US$),NY.GDP.MKTP.CD,5.377778e+08,5.488889e+08,5.466667e+08,7.511112e+08,8.000000e+08,1.006667e+09,1.400000e+09,1.673333e+09,1.373333e+09,1.408889e+09,1.748887e+09,1.831109e+09,1.595555e+09,1.733333e+09,2.155555e+09,2.366667e+09,2.555556e+09,2.953333e+09,3.300000e+09,3.697940e+09,3.641723e+09,3.478788e+09,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,4.055180e+09,4.515559e+09,5.226779e+09,6.209138e+09,6.971286e+09,9.747880e+09,1.010923e+10,1.243909e+10,1.585657e+10,1.780429e+10,2.000160e+10,2.056107e+10,2.048489e+10,1.990711e+10,1.936264e+10,2.019176e+10,1.948438e+10,1.910135e+10
2,Angola,AGO,GDP (current US$),NY.GDP.MKTP.CD,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.930503e+09,5.550483e+09,5.550483e+09,5.784342e+09,6.131475e+09,7.553560e+09,7.072063e+09,8.083872e+09,8.769251e+09,1.020110e+10,1.122876e+10,1.060378e+10,8.307811e+09,5.768720e+09,4.438321e+09,5.538749e+09,7.526447e+09,7.648377e+09,6.506230e+09,6.152923e+09,9.129595e+09,8.936064e+09,1.528559e+10,1.781271e+10,2.355205e+10,3.697092e+10,5.238101e+10,6.526645e+10,8.853861e+10,7.030716e+10,8.379950e+10,1.117897e+11,1.280529e+11,1.367099e+11,1.457122e+11,1.161936e+11,1.011239e+11,1.221238e+11,1.013532e+11,9.463542e+10
3,Albania,ALB,GDP (current US$),NY.GDP.MKTP.CD,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,1.857338e+09,1.897050e+09,2.097326e+09,2.080796e+09,2.051236e+09,2.253090e+09,2.028554e+09,1.099559e+09,6.521750e+08,1.185315e+09,1.880952e+09,2.392765e+09,3.199643e+09,2.258516e+09,2.545967e+09,3.212119e+09,3.480355e+09,3.922099e+09,4.348070e+09,5.611492e+09,7.184681e+09,8.052076e+09,8.896074e+09,1.067732e+10,1.288135e+10,1.204422e+10,1.192693e+10,1.289077e+10,1.231983e+10,1.277622e+10,1.322814e+10,1.138685e+10,1.186120e+10,1.301969e+10,1.514702e+10,1.527808e+10
4,Andorra,AND,GDP (current US$),NY.GDP.MKTP.CD,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,7.861921e+07,8.940982e+07,1.134082e+08,1.508201e+08,1.865587e+08,2.201272e+08,2.272810e+08,2.540202e+08,3.080089e+08,4.115783e+08,4.464161e+08,3.889587e+08,3.758960e+08,3.278618e+08,3.300707e+08,3.467380e+08,4.820006e+08,6.113164e+08,7.214259e+08,7.954493e+08,1.029048e+09,1.106929e+09,1.210014e+09,1.007026e+09,1.017549e+09,1.178739e+09,1.223945e+09,1.180597e+09,1.211932e+09,1.239876e+09,1.429049e+09,1.546926e+09,1.755910e+09,2.361727e+09,2.894922e+09,3.159905e+09,3.456442e+09,3.952601e+09,4.085631e+09,3.674410e+09,3.449967e+09,3.629204e+09,3.188809e+09,3.193704e+09,3.271808e+09,2.789870e+09,2.896679e+09,3.000181e+09,3.218316e+09,3.154058e+09
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
259,Kosovo,XKX,GDP (current US$),NY.GDP.MKTP.CD,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,1.849196e+09,2.535334e+09,2.406271e+09,2.790456e+09,3.556757e+09,3.663102e+09,3.846820e+09,4.655899e+09,5.687418e+09,5.653793e+09,5.835874e+09,6.701698e+09,6.499807e+09,7.074778e+09,7.396705e+09,6.442916e+09,6.719172e+09,7.245707e+09,7.942962e+09,7.926108e+09
260,"Yemen, Rep.",YEM,GDP (current US$),NY.GDP.MKTP.CD,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,0.000000e+00,5.647119e+09,5.930370e+09,6.463650e+09,5.368271e+09,4.167356e+09,4.258789e+09,5.785685e+09,6.838557e+09,6.325142e+09,7.641103e+09,9.652436e+09,9.861560e+09,1.069463e+10,1.177797e+10,1.387279e+10,1.674634e+10,1.906198e+10,2.165053e+10,2.691085e+10,2.513027e+10,3.090675e+10,3.272642e+10,3.540134e+10,4.041524e+10,4.320647e+10,3.697620e+10,2.808468e+10,2.456133e+10,2.759126e+10,0.000000e+00
261,South Africa,ZAF,GDP (current US$),NY.GDP.MKTP.CD,7.575397e+09,7.972997e+09,8.497997e+09,9.423396e+09,1.037400e+10,1.133440e+10,1.235500e+10,1.377739e+10,1.489459e+10,1.678039e+10,1.841839e+10,2.033369e+10,2.135744e+10,2.929567e+10,3.680772e+10,3.811454e+10,3.660335e+10,4.065135e+10,4.673945e+10,5.764572e+10,8.298048e+10,8.545442e+10,7.842306e+10,8.741585e+10,7.734409e+10,5.908264e+10,6.752160e+10,8.857370e+10,9.517664e+10,9.903086e+10,1.155523e+11,1.239428e+11,1.345446e+11,1.343081e+11,1.397525e+11,1.554609e+11,1.476063e+11,1.525874e+11,1.377748e+11,1.366323e+11,1.363613e+11,1.215147e+11,1.154824e+11,1.752569e+11,2.285900e+11,2.577727e+11,2.716385e+11,2.994155e+11,2.867698e+11,2.959365e+11,3.753494e+11,4.164189e+11,3.963327e+11,3.668294e+11,3.509046e+11,3.176205e+11,2.963573e+11,3.495541e+11,3.682889e+11,3.514316e+11
262,Zambia,ZMB,GDP (current US$),NY.GDP.MKTP.CD,7.130000e+08,6.962857e+08,6.931429e+08,7.187143e+08,8.394286e+08,1.082857e+09,1.264286e+09,1.368000e+09,1.605857e+09,1.965714e+09,1.825286e+09,1.687000e+09,1.910714e+09,2.268714e+09,3.121833e+09,2.618667e+09,2.746714e+09,2.483000e+09,2.813375e+09,3.325500e+09,3.829500e+09,3.872667e+09,3.994778e+09,3.216308e+09,2.739444e+09,2.281258e+09,1.661949e+09,2.269895e+09,3.713614e+09,3.998638e+09,3.285217e+09,3.378882e+09,3.181922e+09,3.273238e+09,3.656648e+09,3.807067e+09,3.597221e+09,4.303282e+09,3.537683e+09,3.404312e+09,3.600683e+09,4.094481e+09,4.193846e+09,4.901840e+09,6.221078e+09,8.331870e+09,1.275686e+10,1.405696e+10,1.791086e+10,1.532834e+10,2.026556e+10,2.345952e+10,2.550306e+10,2.804551e+10,2.715065e+10,2.124335e+10,2.095476e+10,2.586814e+10,2.700524e+10,2.306472e+10


## Exercise:

In a separate notebook, do the following:
1. Read in the `GDPData` file as a dataframe
2. Follow each step laid out here 
3. At the bottom of your notebook, include a text cell with the mean GDP in 1963
4. Replace all `NaN` values with a value of your choice

Next, we will begin to manipulate DataFrames. 

Looking at our dataset, it is reasonable to say that there are a few columns that will not be useful to our analysis. 
We will remove the country code, indicator name, and indicator code columns as they will not be necessary for analysis.

The `drop()` function takes a list of strings as an argument, with the list containing each **column** you would like to remove. 

The second argument, `axis`, specifies if you are removing rows or columns. With 0 and 1 respectively representing each option.

In [11]:
df.drop(['Country Code', 'Indicator Name', 'Indicator Code'], axis=1)

Unnamed: 0,Country Name,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
0,Aruba,,,,,,,,,,,,,,,,,,,,,,,,,,,4.054634e+08,4.876025e+08,5.964236e+08,6.953044e+08,7.648871e+08,8.721387e+08,9.584632e+08,1.082980e+09,1.245688e+09,1.320475e+09,1.379961e+09,1.531944e+09,1.665101e+09,1.722799e+09,1.873453e+09,1.920112e+09,1.941341e+09,2.021229e+09,2.228492e+09,2.330726e+09,2.424581e+09,2.615084e+09,2.745251e+09,2.498883e+09,2.390503e+09,2.549721e+09,2.534637e+09,2.701676e+09,2.765363e+09,2.919553e+09,2.965922e+09,3.056425e+09,,
1,Afghanistan,5.377778e+08,5.488889e+08,5.466667e+08,7.511112e+08,8.000000e+08,1.006667e+09,1.400000e+09,1.673333e+09,1.373333e+09,1.408889e+09,1.748887e+09,1.831109e+09,1.595555e+09,1.733333e+09,2.155555e+09,2.366667e+09,2.555556e+09,2.953333e+09,3.300000e+09,3.697940e+09,3.641723e+09,3.478788e+09,,,,,,,,,,,,,,,,,,,,,4.055180e+09,4.515559e+09,5.226779e+09,6.209138e+09,6.971286e+09,9.747880e+09,1.010923e+10,1.243909e+10,1.585657e+10,1.780429e+10,2.000160e+10,2.056107e+10,2.048489e+10,1.990711e+10,1.936264e+10,2.019176e+10,1.948438e+10,1.910135e+10
2,Angola,,,,,,,,,,,,,,,,,,,,,5.930503e+09,5.550483e+09,5.550483e+09,5.784342e+09,6.131475e+09,7.553560e+09,7.072063e+09,8.083872e+09,8.769251e+09,1.020110e+10,1.122876e+10,1.060378e+10,8.307811e+09,5.768720e+09,4.438321e+09,5.538749e+09,7.526447e+09,7.648377e+09,6.506230e+09,6.152923e+09,9.129595e+09,8.936064e+09,1.528559e+10,1.781271e+10,2.355205e+10,3.697092e+10,5.238101e+10,6.526645e+10,8.853861e+10,7.030716e+10,8.379950e+10,1.117897e+11,1.280529e+11,1.367099e+11,1.457122e+11,1.161936e+11,1.011239e+11,1.221238e+11,1.013532e+11,9.463542e+10
3,Albania,,,,,,,,,,,,,,,,,,,,,,,,,1.857338e+09,1.897050e+09,2.097326e+09,2.080796e+09,2.051236e+09,2.253090e+09,2.028554e+09,1.099559e+09,6.521750e+08,1.185315e+09,1.880952e+09,2.392765e+09,3.199643e+09,2.258516e+09,2.545967e+09,3.212119e+09,3.480355e+09,3.922099e+09,4.348070e+09,5.611492e+09,7.184681e+09,8.052076e+09,8.896074e+09,1.067732e+10,1.288135e+10,1.204422e+10,1.192693e+10,1.289077e+10,1.231983e+10,1.277622e+10,1.322814e+10,1.138685e+10,1.186120e+10,1.301969e+10,1.514702e+10,1.527808e+10
4,Andorra,,,,,,,,,,,7.861921e+07,8.940982e+07,1.134082e+08,1.508201e+08,1.865587e+08,2.201272e+08,2.272810e+08,2.540202e+08,3.080089e+08,4.115783e+08,4.464161e+08,3.889587e+08,3.758960e+08,3.278618e+08,3.300707e+08,3.467380e+08,4.820006e+08,6.113164e+08,7.214259e+08,7.954493e+08,1.029048e+09,1.106929e+09,1.210014e+09,1.007026e+09,1.017549e+09,1.178739e+09,1.223945e+09,1.180597e+09,1.211932e+09,1.239876e+09,1.429049e+09,1.546926e+09,1.755910e+09,2.361727e+09,2.894922e+09,3.159905e+09,3.456442e+09,3.952601e+09,4.085631e+09,3.674410e+09,3.449967e+09,3.629204e+09,3.188809e+09,3.193704e+09,3.271808e+09,2.789870e+09,2.896679e+09,3.000181e+09,3.218316e+09,3.154058e+09
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
259,Kosovo,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.849196e+09,2.535334e+09,2.406271e+09,2.790456e+09,3.556757e+09,3.663102e+09,3.846820e+09,4.655899e+09,5.687418e+09,5.653793e+09,5.835874e+09,6.701698e+09,6.499807e+09,7.074778e+09,7.396705e+09,6.442916e+09,6.719172e+09,7.245707e+09,7.942962e+09,7.926108e+09
260,"Yemen, Rep.",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,5.647119e+09,5.930370e+09,6.463650e+09,5.368271e+09,4.167356e+09,4.258789e+09,5.785685e+09,6.838557e+09,6.325142e+09,7.641103e+09,9.652436e+09,9.861560e+09,1.069463e+10,1.177797e+10,1.387279e+10,1.674634e+10,1.906198e+10,2.165053e+10,2.691085e+10,2.513027e+10,3.090675e+10,3.272642e+10,3.540134e+10,4.041524e+10,4.320647e+10,3.697620e+10,2.808468e+10,2.456133e+10,2.759126e+10,
261,South Africa,7.575397e+09,7.972997e+09,8.497997e+09,9.423396e+09,1.037400e+10,1.133440e+10,1.235500e+10,1.377739e+10,1.489459e+10,1.678039e+10,1.841839e+10,2.033369e+10,2.135744e+10,2.929567e+10,3.680772e+10,3.811454e+10,3.660335e+10,4.065135e+10,4.673945e+10,5.764572e+10,8.298048e+10,8.545442e+10,7.842306e+10,8.741585e+10,7.734409e+10,5.908264e+10,6.752160e+10,8.857370e+10,9.517664e+10,9.903086e+10,1.155523e+11,1.239428e+11,1.345446e+11,1.343081e+11,1.397525e+11,1.554609e+11,1.476063e+11,1.525874e+11,1.377748e+11,1.366323e+11,1.363613e+11,1.215147e+11,1.154824e+11,1.752569e+11,2.285900e+11,2.577727e+11,2.716385e+11,2.994155e+11,2.867698e+11,2.959365e+11,3.753494e+11,4.164189e+11,3.963327e+11,3.668294e+11,3.509046e+11,3.176205e+11,2.963573e+11,3.495541e+11,3.682889e+11,3.514316e+11
262,Zambia,7.130000e+08,6.962857e+08,6.931429e+08,7.187143e+08,8.394286e+08,1.082857e+09,1.264286e+09,1.368000e+09,1.605857e+09,1.965714e+09,1.825286e+09,1.687000e+09,1.910714e+09,2.268714e+09,3.121833e+09,2.618667e+09,2.746714e+09,2.483000e+09,2.813375e+09,3.325500e+09,3.829500e+09,3.872667e+09,3.994778e+09,3.216308e+09,2.739444e+09,2.281258e+09,1.661949e+09,2.269895e+09,3.713614e+09,3.998638e+09,3.285217e+09,3.378882e+09,3.181922e+09,3.273238e+09,3.656648e+09,3.807067e+09,3.597221e+09,4.303282e+09,3.537683e+09,3.404312e+09,3.600683e+09,4.094481e+09,4.193846e+09,4.901840e+09,6.221078e+09,8.331870e+09,1.275686e+10,1.405696e+10,1.791086e+10,1.532834e+10,2.026556e+10,2.345952e+10,2.550306e+10,2.804551e+10,2.715065e+10,2.124335e+10,2.095476e+10,2.586814e+10,2.700524e+10,2.306472e+10


In [14]:
#df.columns

#MEAN GDP 1963
#df.describe()
#8.188647e+10

#df.index
#df.mean()

df.fillna("potato")

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
0,Aruba,ABW,GDP (current US$),NY.GDP.MKTP.CD,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,4.05463e+08,4.87602e+08,5.96424e+08,6.95304e+08,7.64887e+08,8.72139e+08,9.58463e+08,1.08298e+09,1.24569e+09,1.32047e+09,1.37996e+09,1.53194e+09,1.6651e+09,1.7228e+09,1.87345e+09,1.92011e+09,1.94134e+09,2.02123e+09,2.22849e+09,2.33073e+09,2.42458e+09,2.61508e+09,2.74525e+09,2.49888e+09,2.3905e+09,2.54972e+09,2.53464e+09,2.70168e+09,2.76536e+09,2.91955e+09,2.96592e+09,3.05642e+09,potato,potato
1,Afghanistan,AFG,GDP (current US$),NY.GDP.MKTP.CD,5.37778e+08,5.48889e+08,5.46667e+08,7.51111e+08,8e+08,1.00667e+09,1.4e+09,1.67333e+09,1.37333e+09,1.40889e+09,1.74889e+09,1.83111e+09,1.59556e+09,1.73333e+09,2.15556e+09,2.36667e+09,2.55556e+09,2.95333e+09,3.3e+09,3.69794e+09,3.64172e+09,3.47879e+09,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,4.05518e+09,4.51556e+09,5.22678e+09,6.20914e+09,6.97129e+09,9.74788e+09,1.01092e+10,1.24391e+10,1.58566e+10,1.78043e+10,2.00016e+10,2.05611e+10,2.04849e+10,1.99071e+10,1.93626e+10,2.01918e+10,1.94844e+10,1.91014e+10
2,Angola,AGO,GDP (current US$),NY.GDP.MKTP.CD,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,5.9305e+09,5.55048e+09,5.55048e+09,5.78434e+09,6.13148e+09,7.55356e+09,7.07206e+09,8.08387e+09,8.76925e+09,1.02011e+10,1.12288e+10,1.06038e+10,8.30781e+09,5.76872e+09,4.43832e+09,5.53875e+09,7.52645e+09,7.64838e+09,6.50623e+09,6.15292e+09,9.12959e+09,8.93606e+09,1.52856e+10,1.78127e+10,2.35521e+10,3.69709e+10,5.2381e+10,6.52665e+10,8.85386e+10,7.03072e+10,8.37995e+10,1.1179e+11,1.28053e+11,1.3671e+11,1.45712e+11,1.16194e+11,1.01124e+11,1.22124e+11,1.01353e+11,9.46354e+10
3,Albania,ALB,GDP (current US$),NY.GDP.MKTP.CD,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,1.85734e+09,1.89705e+09,2.09733e+09,2.0808e+09,2.05124e+09,2.25309e+09,2.02855e+09,1.09956e+09,6.52175e+08,1.18532e+09,1.88095e+09,2.39276e+09,3.19964e+09,2.25852e+09,2.54597e+09,3.21212e+09,3.48036e+09,3.9221e+09,4.34807e+09,5.61149e+09,7.18468e+09,8.05208e+09,8.89607e+09,1.06773e+10,1.28814e+10,1.20442e+10,1.19269e+10,1.28908e+10,1.23198e+10,1.27762e+10,1.32281e+10,1.13868e+10,1.18612e+10,1.30197e+10,1.5147e+10,1.52781e+10
4,Andorra,AND,GDP (current US$),NY.GDP.MKTP.CD,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,7.86192e+07,8.94098e+07,1.13408e+08,1.5082e+08,1.86559e+08,2.20127e+08,2.27281e+08,2.5402e+08,3.08009e+08,4.11578e+08,4.46416e+08,3.88959e+08,3.75896e+08,3.27862e+08,3.30071e+08,3.46738e+08,4.82001e+08,6.11316e+08,7.21426e+08,7.95449e+08,1.02905e+09,1.10693e+09,1.21001e+09,1.00703e+09,1.01755e+09,1.17874e+09,1.22395e+09,1.1806e+09,1.21193e+09,1.23988e+09,1.42905e+09,1.54693e+09,1.75591e+09,2.36173e+09,2.89492e+09,3.15991e+09,3.45644e+09,3.9526e+09,4.08563e+09,3.67441e+09,3.44997e+09,3.6292e+09,3.18881e+09,3.1937e+09,3.27181e+09,2.78987e+09,2.89668e+09,3.00018e+09,3.21832e+09,3.15406e+09
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
259,Kosovo,XKX,GDP (current US$),NY.GDP.MKTP.CD,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,1.8492e+09,2.53533e+09,2.40627e+09,2.79046e+09,3.55676e+09,3.6631e+09,3.84682e+09,4.6559e+09,5.68742e+09,5.65379e+09,5.83587e+09,6.7017e+09,6.49981e+09,7.07478e+09,7.39671e+09,6.44292e+09,6.71917e+09,7.24571e+09,7.94296e+09,7.92611e+09
260,"Yemen, Rep.",YEM,GDP (current US$),NY.GDP.MKTP.CD,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,potato,5.64712e+09,5.93037e+09,6.46365e+09,5.36827e+09,4.16736e+09,4.25879e+09,5.78569e+09,6.83856e+09,6.32514e+09,7.6411e+09,9.65244e+09,9.86156e+09,1.06946e+10,1.1778e+10,1.38728e+10,1.67463e+10,1.9062e+10,2.16505e+10,2.69109e+10,2.51303e+10,3.09067e+10,3.27264e+10,3.54013e+10,4.04152e+10,4.32065e+10,3.69762e+10,2.80847e+10,2.45613e+10,2.75913e+10,potato
261,South Africa,ZAF,GDP (current US$),NY.GDP.MKTP.CD,7.5754e+09,7.973e+09,8.498e+09,9.4234e+09,1.0374e+10,1.13344e+10,1.2355e+10,1.37774e+10,1.48946e+10,1.67804e+10,1.84184e+10,2.03337e+10,2.13574e+10,2.92957e+10,3.68077e+10,3.81145e+10,3.66033e+10,4.06513e+10,4.67394e+10,5.76457e+10,8.29805e+10,8.54544e+10,7.84231e+10,8.74159e+10,7.73441e+10,5.90826e+10,6.75216e+10,8.85737e+10,9.51766e+10,9.90309e+10,1.15552e+11,1.23943e+11,1.34545e+11,1.34308e+11,1.39753e+11,1.55461e+11,1.47606e+11,1.52587e+11,1.37775e+11,1.36632e+11,1.36361e+11,1.21515e+11,1.15482e+11,1.75257e+11,2.2859e+11,2.57773e+11,2.71638e+11,2.99416e+11,2.8677e+11,2.95936e+11,3.75349e+11,4.16419e+11,3.96333e+11,3.66829e+11,3.50905e+11,3.17621e+11,2.96357e+11,3.49554e+11,3.68289e+11,3.51432e+11
262,Zambia,ZMB,GDP (current US$),NY.GDP.MKTP.CD,7.13e+08,6.96286e+08,6.93143e+08,7.18714e+08,8.39429e+08,1.08286e+09,1.26429e+09,1.368e+09,1.60586e+09,1.96571e+09,1.82529e+09,1.687e+09,1.91071e+09,2.26871e+09,3.12183e+09,2.61867e+09,2.74671e+09,2.483e+09,2.81338e+09,3.3255e+09,3.8295e+09,3.87267e+09,3.99478e+09,3.21631e+09,2.73944e+09,2.28126e+09,1.66195e+09,2.26989e+09,3.71361e+09,3.99864e+09,3.28522e+09,3.37888e+09,3.18192e+09,3.27324e+09,3.65665e+09,3.80707e+09,3.59722e+09,4.30328e+09,3.53768e+09,3.40431e+09,3.60068e+09,4.09448e+09,4.19385e+09,4.90184e+09,6.22108e+09,8.33187e+09,1.27569e+10,1.4057e+10,1.79109e+10,1.53283e+10,2.02656e+10,2.34595e+10,2.55031e+10,2.80455e+10,2.71506e+10,2.12433e+10,2.09548e+10,2.58681e+10,2.70052e+10,2.30647e+10


As we can see here, our data cleaning work was not saved from earlier! This is because we didn't update our df variable with the results. 

Let's rerun that below and make sure we save it this time. You can use variables to store DataFrames the same way they can store `lists`, `ints`, or `strings`.

The cell below will assign the updated DataFrame back to the `df` variable.

In [35]:
#df = df.dropna()
#df = df.drop(['Country Code', 'Indicator Name', 'Indicator Code'], axis=1)

#df

## Exercise: 

Update your separate notebook from the earlier exercise to include what we learned about using DataFrame variables.
You should also drop all rows with null values and remove the three columns we removed in this exercise. 

# Moving On:

In Pandas, you can conditionally select rows based on the value a particular column contains. See the example below for selecting the row containing France.

The second example selects two columns and displays them as a DataFrame. 

The third example cell shows how to use iloc to select specific values from your DataFrame. iloc allows you to select based on the row and column index.



In [18]:
df[df['Country Name'] == 'France']

Unnamed: 0,Country Name,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
75,France,62225480000.0,67461640000.0,75607530000.0,84759200000.0,94007850000.0,101537200000.0,110045900000.0,118973000000.0,129785400000.0,141903100000.0,148456400000.0,165966600000.0,203494100000.0,264429900000.0,285552400000.0,360832200000.0,372319000000.0,410279500000.0,506707800000.0,613953100000.0,701288400000.0,615552200000.0,584877700000.0,559869200000.0,530683800000.0,553138400000.0,771470800000.0,934173300000.0,1018847000000.0,1025212000000.0,1269180000000.0,1269277000000.0,1401466000000.0,1322816000000.0,1393983000000.0,1601095000000.0,1605675000000.0,1452885000000.0,1503109000000.0,1492648000000.0,1362249000000.0,1376465000000.0,1494287000000.0,1840481000000.0,2115742000000.0,2196126000000.0,2318594000000.0,2657213000000.0,2918383000000.0,2690222000000.0,2642610000000.0,2861408000000.0,2683825000000.0,2811078000000.0,2852166000000.0,2438208000000.0,2471286000000.0,2595151000000.0,2787864000000.0,2715518000000.0


In [19]:
df[['Country Name','1965']]

Unnamed: 0,Country Name,1965
11,Australia,2.593684e+10
12,Austria,9.994071e+09
14,Burundi,1.589950e+08
15,Belgium,1.737146e+10
16,Benin,2.899087e+08
...,...,...
251,St. Vincent and the Grenadines,1.510821e+07
257,World,1.966264e+12
261,South Africa,1.133440e+10
262,Zambia,1.082857e+09


You can also iterate over a column in pandas much like how you could iterate through a list. 

In [20]:
for item in df['1965']:
  print(item)

25936835032.0
9994070616.0
158994963.0
17371457608.0
289908720.6
422916848.4
5906636557.0
300392156.9
40069930.07
604377104.4
21790035117.0
45790869.75
150574816.3
54515179581.0
15346741670.0
6026593750.0
70436266147.0
919771356.4
814139855.8
4043901818.0
198318063.9
5760761905.0
592981162.3
2660946061.0
888100000.0
94580860952.0
209370309381.0
224575509177.0
2387048255.0
407845094423.0
24756958695.0
22060164236.0
8589340019.0
147084750.0
101537248148.0
226474285.6
101824755079.0
2053462872.0
7689154053.0
1331399900.0
213235294.1
1550776282591.0
2435078534.0
508650000.0
24406875600.0
353251800.0
401055550813.0
454160665242.0
53617985478.0
17328250389.0
37719212748.0
59554854575.0
2945704143.0
523694949.4
3663333333.0
67978153851.0
972140557.2
90950278258.0
997919320.0
13593932.32
3120833333.0
105189609504.0
118564680313.0
1698319328.0
131174202424.0
432952960394.0
54878902.42
236616225676.0
921600736.3
2948325264.0
833563472.2
21840000000.0
415005604580.0
229455410.9
2956356984.0
79832

In [21]:
df.iloc[4]['1965']

289908720.6

## Exercise: 
    
Select the row which contains South Africa. Use `iloc` to return their GDP in 1974 and 1975. Assign each value to a unique variable.

Create an `if-statement` that checks which year had a larger GDP and prints a statement informing the user of which year is higher.

**CHALLENGE**: Write a `for` loop that iterates through each year of South Africa's GDP data and returns the year with the highest GDP figure, and what that figure is.

In [29]:
#df[df['Country Name'] == 'South Africa']

SA_data = df[['Country Name', '1974', '1975']]
SA_data = SA_data[SA_data['Country Name'] == 'South Africa']

SA_data

#SA_GDP_1974 = df.iloc[4]['1974'] #3.680772e+10
#SA_GDP_1975 = df.iloc[4]['1975'] #3.811454e+10

#print(SA_GDP_1974)

Unnamed: 0,Country Name,1974,1975
261,South Africa,36807720000.0,38114540000.0


In [None]:
#for item in df['1975']:
#  print(item)

In [45]:
SouthAfrica = df[df['Country Name'] == 'South Africa']
SA_First = SouthAfrica.iloc[0]['1974']
SA_Second = SouthAfrica.iloc[0]['1975']
print(SA_First)
print(SA_Second)

if (SA_First > SA_Second):
  print("1974 is higher than 1975")
else:
  print("1975 is higher than 1974")




36807721039.0
38114542813.0
1975 is higher than 1974


In [50]:
print(SouthAfrica)

#Write a for loop that iterates through each year of South Africa's GDP
#data and returns the year with the highest GDP figure, and what that figure is.
rawSA = SouthAfrica.drop(['Country Name'], axis=1)
topItem = 0

for item in rawSA.iloc[0]:
  if item > topItem:
    topItem = item

print(topItem)

#topInfo = SouthAfrica[SouthAfrica[] == topItem]
#print(topInfo)

     Country Name          1960  ...          2018          2019
261  South Africa  7.575397e+09  ...  3.682889e+11  3.514316e+11

[1 rows x 61 columns]
416418874936.0


# Workshop

Now that we have learned how to read in a csv, assess our data, and manipulate it, you will take the ```academy_awards.csv``` file you have been working with and go through the following steps:

1. Read in the data using Pandas' ```read_csv``` function. NOTE: use the following code to import the file, or else you will get an error message: ```df = pd.read_csv("academy_awards.csv", error_bad_lines=False)```
2. Get familiar with your data, jot down some notes in a text cell in your notebook about what stands out
3. ```names = ['Brad Pitt', 'Leonardo Dicaprio', 'Julia Roberts', 'George Clooney']``` Using this list, find the rows in the dataframe where one of these actors were nominated for an award. Write code that iterates over the list of names, printing "In year_nominated, name_of_actor was nominated for an Academy Award." If there are multiple results, choose the second row in your dataframe.

HINT: You will need to use iloc for step 3.


In [30]:
import pandas as pd

#Read in the data using Pandas' read_csv function
awards = pd.read_csv("academy_awards.csv", error_bad_lines=False)

print("done")

done


In [34]:
#awards.shape
#(10137, 11)

#awards.columns
#'Year', 'Category', 'Nominee', 'Additional Info', 'Won?', 'Unnamed: 5',
#'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8', 'Unnamed: 9', 'Unnamed: 10'],
#dtype='object'

awards.head(5)

#WHAT STANDS OUT?
#it's weird that there are so many Unnamed headers

Unnamed: 0,Year,Category,Nominee,Additional Info,Won?,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10
0,2010 (83rd),Actor -- Leading Role,Javier Bardem,Biutiful {'Uxbal'},NO,,,,,,
1,2010 (83rd),Actor -- Leading Role,Jeff Bridges,True Grit {'Rooster Cogburn'},NO,,,,,,
2,2010 (83rd),Actor -- Leading Role,Jesse Eisenberg,The Social Network {'Mark Zuckerberg'},NO,,,,,,
3,2010 (83rd),Actor -- Leading Role,Colin Firth,The King's Speech {'King George VI'},YES,,,,,,
4,2010 (83rd),Actor -- Leading Role,James Franco,127 Hours {'Aron Ralston'},NO,,,,,,


In [126]:
names = ['Brad Pitt', 'Leonardo DiCaprio', 'Julia Roberts', 'George Clooney']

#find the rows in the dataframe where one of these actors were nominated for an award
#Write code that iterates over the list of names
#printing "In year_nominated, name_of_actor was nominated for an Academy Award."

#If there are multiple results, choose the second row in your dataframe.

slim_data = awards[['Year', 'Nominee', 'Won?']]
#print(slim_data)

namesOnlydata = []
nominee_Info = {}

for n in names:
  nominee_Info = {}
  namesOnly = slim_data[slim_data['Nominee'] == n]
  nominee_Info[n] = namesOnly['Year']
  namesOnlydata.append(nominee_Info)


#"In year_nominated, name_of_actor was nominated for an Academy Award."
for n in namesOnlydata:
  for key, value in n.items():   
    for item in value:
      item = item[:-7]
      print(f"In {item}, {key} was nominated for an Academy Award.")
    print()

#--------------------------------------


#I know this can't be the right way to do this but I didn't fully understand
#pandas so I just used dictionaries and lists



In 2008, Brad Pitt was nominated for an Academy Award.
In 1995, Brad Pitt was nominated for an Academy Award.

In 2006, Leonardo DiCaprio was nominated for an Academy Award.
In 2004, Leonardo DiCaprio was nominated for an Academy Award.
In 1993, Leonardo DiCaprio was nominated for an Academy Award.

In 2000, Julia Roberts was nominated for an Academy Award.
In 1990, Julia Roberts was nominated for an Academy Award.
In 1989, Julia Roberts was nominated for an Academy Award.

In 2009, George Clooney was nominated for an Academy Award.
In 2007, George Clooney was nominated for an Academy Award.
In 2005, George Clooney was nominated for an Academy Award.

