# pandas

*pandas* is a Python library for data analysis which provides fast, powerful, flexible and easy to use open source data analysis and manipulation tools. It also provides expressive data structures designed to make working with structured (tabular, multidimensional, potentially heterogeneous) and time series data both easy and intuitive.

It offers a number of data exploration, cleaning and transformation operations that are critical in working with data in Python. 

*pandas* build upon *numpy* and *scipy* providing easy-to-use data structures and data manipulation functions with integrated indexing.

The main data structures *pandas* provides are *Series* and *DataFrames*. After a brief introduction to these two data structures and data ingestion, the key features of *pandas* this notebook covers are:
* Generating descriptive statistics on data
* Data cleaning using built in pandas functions
* Frequent data operations for subsetting, filtering, insertion, deletion and aggregation of data
* Merging multiple datasets using dataframes
* Working with timestamps and time-series data

This notebook is a summary of an introductions to pandas. 

In [1]:
# Import the library with alias
import pandas as pd

In [6]:
# Associate's Degrees in Science and Engineering Conferred per 1,000 Individuals 18–24 Years Old (Degrees)
# this a publicly 
df = pd.read_excel("se-associates-degrees-per-1000-18-24-year-olds.xlsx", skiprows = [0, 1, 2], usecols = [0, 1, 2, 3, 4, 5, 6, 7])
df.head()

Unnamed: 0,State,2000,2001,2002,2003,2004,2005,2006
0,United States,38539.0,45358.0,51325.0,62930.0,60246.0,54750.0,49850.0
1,Alabama,573.0,640.0,714.0,885.0,746.0,635.0,509.0
2,Alaska,3.0,0.0,0.0,62.0,37.0,20.0,32.0
3,Arizona,789.0,996.0,1094.0,1328.0,1515.0,1373.0,1295.0
4,Arkansas,181.0,163.0,205.0,288.0,218.0,215.0,227.0


In [3]:
df.tail(10)

Unnamed: 0,State,2000,2001,2002,2003,2004,2005,2006,2007,2008,...,2009.2,2010.2,2011.2,2012.2,2013.2,2014.2,2015.2,2016.2,2017.2,2018.2
50,Wisconsin,591.0,806.0,918.0,1554.0,1489.0,1282.0,1080.0,984.0,875.0,...,1.643302,1.849928,2.396863,2.674979,2.511796,2.222372,1.956355,2.070239,2.004531,2.078784
51,Wyoming,219.0,271.0,298.0,321.0,311.0,307.0,315.0,313.0,327.0,...,5.327669,5.92924,6.061576,6.095027,6.78706,6.443389,7.809026,7.240018,8.204642,8.745966
52,,,,,,,,,,,...,,,,,,,,,,
53,Puerto Rico,855.0,709.0,606.0,537.0,557.0,434.0,331.0,343.0,315.0,...,0.948617,1.034899,0.984925,1.026267,0.772984,1.10652,0.971929,0.837658,1.047461,0.927064
54,,,,,,,,,,,...,,,,,,,,,,
55,NOTES: The national associate's S&E degrees to...,,,,,,,,,,...,,,,,,,,,,
56,SOURCE: National Center for Education Statisti...,,,,,,,,,,...,,,,,,,,,,
57,Recommended Citation: National Science Board. ...,,,,,,,,,,...,,,,,,,,,,
58,"Last updated: April 29, 2020",,,,,,,,,,...,,,,,,,,,,
59,Science and Engineering Indicators,,,,,,,,,,...,,,,,,,,,,


In [4]:
df.iloc[:54]

Unnamed: 0,State,2000,2001,2002,2003,2004,2005,2006,2007,2008,...,2009.2,2010.2,2011.2,2012.2,2013.2,2014.2,2015.2,2016.2,2017.2,2018.2
0,United States,38539.0,45358.0,51325.0,62930.0,60246.0,54750.0,49850.0,47485.0,49166.0,...,1.756204,1.997881,2.416858,2.670736,2.692942,2.768223,2.877261,2.851157,3.001163,3.20252
1,Alabama,573.0,640.0,714.0,885.0,746.0,635.0,509.0,357.0,383.0,...,0.769579,0.858799,1.019051,1.04643,0.975455,0.914706,0.90472,0.737081,0.651524,0.545666
2,Alaska,3.0,0.0,0.0,62.0,37.0,20.0,32.0,16.0,14.0,...,0.177477,0.477713,1.247997,0.977505,1.205228,1.005332,0.741492,0.724706,0.833785,0.511531
3,Arizona,789.0,996.0,1094.0,1328.0,1515.0,1373.0,1295.0,1496.0,2646.0,...,8.227975,12.852525,17.843667,18.761808,13.842297,11.919521,10.029093,7.814437,6.417152,5.004393
4,Arkansas,181.0,163.0,205.0,288.0,218.0,215.0,227.0,213.0,171.0,...,0.741258,0.816092,0.618154,0.587879,0.638613,0.691922,0.73523,0.635181,0.683839,0.976556
5,California,7620.0,7504.0,8830.0,10099.0,9435.0,8923.0,8839.0,9011.0,9636.0,...,2.885627,3.231146,3.83052,4.379885,5.259684,5.963148,6.660418,7.750218,8.617775,10.213075
6,Colorado,601.0,755.0,870.0,802.0,720.0,553.0,355.0,294.0,320.0,...,1.114751,1.185648,0.892685,0.769913,0.841889,0.81638,0.909904,0.810885,0.792038,0.793428
7,Connecticut,115.0,122.0,162.0,239.0,233.0,266.0,220.0,167.0,170.0,...,0.56898,0.558858,0.64429,0.702786,0.792473,0.784862,0.921829,0.846045,0.943936,0.954251
8,Delaware,83.0,78.0,66.0,71.0,120.0,119.0,157.0,133.0,96.0,...,1.470064,1.308195,1.467216,1.553381,2.028265,2.502709,2.5052,2.911721,3.145289,3.46029
9,District of Columbia,69.0,216.0,238.0,182.0,220.0,41.0,206.0,160.0,137.0,...,1.247881,0.522042,0.527211,0.296114,0.265998,0.386737,0.418618,0.212106,0.187291,0.269753


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 60 entries, 0 to 59
Data columns (total 60 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   State        58 non-null     object 
 1   2000         53 non-null     float64
 2   2001         53 non-null     float64
 3   2002         53 non-null     float64
 4   2003         53 non-null     float64
 5   2004         53 non-null     float64
 6   2005         53 non-null     float64
 7   2006         53 non-null     float64
 8   2007         53 non-null     float64
 9   2008         53 non-null     float64
 10  2009         53 non-null     float64
 11  2010         53 non-null     float64
 12  2011         53 non-null     float64
 13  2012         53 non-null     float64
 14  2013         53 non-null     float64
 15  2014         53 non-null     float64
 16  2015         53 non-null     float64
 17  2016         53 non-null     float64
 18  2017         53 non-null     float64
 19  2018      

Unnamed: 0,State,2000,2001,2002,2003,2004,2005,2006,2007,2008,...,2009.2,2010.2,2011.2,2012.2,2013.2,2014.2,2015.2,2016.2,2017.2,2018.2
