<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Series-and-Dataframes" data-toc-modified-id="Series-and-Dataframes-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Series and Dataframes</a></span><ul class="toc-item"><li><span><a href="#Series" data-toc-modified-id="Series-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Series</a></span><ul class="toc-item"><li><span><a href="#Create-a-Series" data-toc-modified-id="Create-a-Series-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Create a Series</a></span></li><li><span><a href="#Using-the-Series-index" data-toc-modified-id="Using-the-Series-index-1.1.2"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>Using the Series index</a></span></li></ul></li><li><span><a href="#Dataframes" data-toc-modified-id="Dataframes-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Dataframes</a></span><ul class="toc-item"><li><span><a href="#Create-a-DataFrame" data-toc-modified-id="Create-a-DataFrame-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Create a DataFrame</a></span></li><li><span><a href="#Using-the-DataFrame-index" data-toc-modified-id="Using-the-DataFrame-index-1.2.2"><span class="toc-item-num">1.2.2&nbsp;&nbsp;</span>Using the DataFrame index</a></span></li></ul></li></ul></li></ul></div>

From the documentation.  https://pandas.pydata.org/docs/getting_started/overview.html

The two primary data structures of pandas, Series (1-dimensional) and DataFrame (2-dimensional), 
handle the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering. For R users, DataFrame provides everything that R’s data.frame provides and much more. pandas is built on top of NumPy and is intended to integrate well within a scientific computing environment with many other 3rd party libraries.<p>
    Here are just a few of the things that pandas does well:

- Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data

- Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects

- Powerful, flexible group by functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data

- Intuitive merging and joining data sets

- Flexible reshaping and pivoting of data sets

- Robust IO tools for loading data from flat files (CSV and delimited), Excel files, databases, and saving / loading data from the ultrafast HDF5 format

In [None]:
import numpy as np
import pandas as pd
#from numpy.random import randn


# Series and Dataframes

## Series

**Series** is a one-dimensional labeled array capable of holding any data type. The **axis labels** are collectively referred to as the index. The basic method to create a Series is to call:

s = pd.Series(data, index=index)

### Create a Series

In [2]:
# Use the Series method: s = pd.Series(data, index=index)
# Shift + Tab t osee other parameters

s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])
s

a    0.273202
b    0.584901
c    0.203590
d    0.439210
e    0.430853
dtype: float64

In [3]:
# Index is optional

s = pd.Series(randn(5))  # Don't need np.random because randn was imported.
s

0    0.702335
1    0.525573
2    1.401937
3   -0.625349
4   -0.609826
dtype: float64

In [4]:
# A list, array or dictionary can be used to create a series.

my_list = [5,3,0]
my_arr = np.array([5,3,0])
my_dictionary = {'a':5,'b':3,'c':0}

In [5]:
# Use a list w/o an index

pd.Series(my_list)

0    5
1    3
2    0
dtype: int64

In [6]:
# Use a list w/ an index

pd.Series(my_list, index=['a','b','c'])

a    5
b    3
c    0
dtype: int64

In [14]:
# Use a list w/ a list for the index
i_names = [['a','b','c']]

pd.Series(my_list, i_names)

a    5
b    3
c    0
dtype: int64

In [7]:
# Use an array
my_arr = np.array([5,3,0])
pd.Series(my_arr, index=['a','b','c'])

a    5
b    3
c    0
dtype: int64

In [12]:
# Use a dictionary
my_dictionary = {'a':5,'b':3,'c':0}
pd.Series(my_dictionary, index=['a','b','c'])

# What happens if the index list is changed to hold x,y and z?

a    5
b    3
c    0
dtype: int64

In [15]:
# Using strings
my_cities = ['Chicago','Atlanta','Boston']
pd.Series(my_cities, i_names)

a    Chicago
b    Atlanta
c     Boston
dtype: object

In [16]:
# Use the cities as the labels
my_cities = ['Chicago','Atlanta','Boston']
state = ['IL','GA','MA']
cities = pd.Series(state, my_cities)
cities

Chicago    IL
Atlanta    GA
Boston     MA
dtype: object

### Using the Series index


In [17]:
cities['Chicago']

'IL'

In [18]:
cities[0]

  cities[0]


'IL'

## Dataframes

**DataFrame** is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. 

### Create a DataFrame

In [19]:
np.random.seed(1234)  
df = pd.DataFrame(randn(4,5),index=['IL','GA','MA','VT'],columns=['Sent','Used','Expired','Lost','Destroyed'])
df

Unnamed: 0,Sent,Used,Expired,Lost,Destroyed
IL,0.471435,-1.190976,1.432707,-0.312652,-0.720589
GA,0.887163,0.859588,-0.636524,0.015696,-2.242685
MA,1.150036,0.991946,0.953324,-2.021255,-0.334077
VT,0.002118,0.405453,0.289092,1.321158,-1.546906


In [31]:
# A little shortcut
np.random.seed(1234)  
df = pd.DataFrame(randn(4,5),index='IL, GA, MA ,VT'.split(','),columns='S U E L D'.split())
df

Unnamed: 0,S,U,E,L,D
IL,0.471435,-1.190976,1.432707,-0.312652,-0.720589
GA,0.887163,0.859588,-0.636524,0.015696,-2.242685
MA,1.150036,0.991946,0.953324,-2.021255,-0.334077
VT,0.002118,0.405453,0.289092,1.321158,-1.546906


In [21]:
# Create a DataFrame

data = {
    'apples': [3, 2, 0, 1], 
    'oranges': [0, 3, 7, 2]
}
sales = pd.DataFrame(data)
sales

Unnamed: 0,apples,oranges
0,3,0
1,2,3
2,0,7
3,1,2


In [22]:
sales.iloc[1:2,2:3]

1


In [23]:
df.head()

Unnamed: 0,S,U,E,L,D
IL,0.471435,-1.190976,1.432707,-0.312652,-0.720589
GA,0.887163,0.859588,-0.636524,0.015696,-2.242685
MA,1.150036,0.991946,0.953324,-2.021255,-0.334077
VT,0.002118,0.405453,0.289092,1.321158,-1.546906


### Using the DataFrame index

**.loc[] and .iloc[]**

In [24]:
df

Unnamed: 0,S,U,E,L,D
IL,0.471435,-1.190976,1.432707,-0.312652,-0.720589
GA,0.887163,0.859588,-0.636524,0.015696,-2.242685
MA,1.150036,0.991946,0.953324,-2.021255,-0.334077
VT,0.002118,0.405453,0.289092,1.321158,-1.546906


In [25]:
# Select a column
df['S']

IL    0.471435
GA    0.887163
MA    1.150036
VT    0.002118
Name: S, dtype: float64

In [26]:
# Select multiple columns
df[['S','E']]            # Outer brackets: [ expecting an arguement] inner brackets: passing in a list ['a','b']

Unnamed: 0,S,E
IL,0.471435,1.432707
GA,0.887163,-0.636524
MA,1.150036,0.953324
VT,0.002118,0.289092


In [27]:
df

Unnamed: 0,S,U,E,L,D
IL,0.471435,-1.190976,1.432707,-0.312652,-0.720589
GA,0.887163,0.859588,-0.636524,0.015696,-2.242685
MA,1.150036,0.991946,0.953324,-2.021255,-0.334077
VT,0.002118,0.405453,0.289092,1.321158,-1.546906


In [28]:
# Getting a row
df.loc['IL']

S    0.471435
U   -1.190976
E    1.432707
L   -0.312652
D   -0.720589
Name: IL, dtype: float64

In [29]:
df.iloc[0]

S    0.471435
U   -1.190976
E    1.432707
L   -0.312652
D   -0.720589
Name: IL, dtype: float64

In [30]:
df.iloc[1:3]

Unnamed: 0,S,U,E,L,D
GA,0.887163,0.859588,-0.636524,0.015696,-2.242685
MA,1.150036,0.991946,0.953324,-2.021255,-0.334077
