# Python Pandas

Portmanteau for panel + data, panel data is data that is multi-dimensional that varies over time. <br>

Provides fast, flexible, and expressive data structures that make working with relational, or labeled data, really easy for data analysis. Pandas relies heavily on NumPy to work, the library sits right on top of the package. 

In [4]:
import numpy as np
import pandas as pd

ModuleNotFoundError: No module named 'utils'

### Exercise One - A Series

A __series__ like a python dictionary and a NumPy array got together and had a kid. It inherits the qualities of both. You can specify the datatype of a NumPy array, and you can grab values and key (called lables = key), keys are also called index. Unlike keys of a dictionary, Pandas labels don't have to be unique. Methods are taking missing data into account during their calculations.

#### Creating a Series from a Python Dictionary
There are a couple ways to create a new __series__ from scratch, the first example is below:

In [5]:
test_balance_data = {
    'pasan': 20.00,
    'treasure': 20.18,
    'ashley': 1.05,
    'craig': 42.42,
}

# The series constructor accepts any dict-like object
balances = pd.Series(test_balance_data)
balances

pasan       20.00
treasure    20.18
ashley       1.05
craig       42.42
dtype: float64

#### Creating a Series from an Iterable
I can also create a series from any iterable as well. __Note__ that when labels are not present, they're defaulted to incremental integers starting at 0. 

In [6]:
unlabled_balances = pd.Series([20.00, 20.18, 1.05, 42.42])
unlabled_balances

0    20.00
1    20.18
2     1.05
3    42.42
dtype: float64

I can also provide the `index=` argument that requires an iterable the same size as the data. The order of the labels is guaranteed to match the same order of the supplied index. 

In [8]:
labeled_balances = pd.Series(
    [20.00, 20.18, 1.05, 42.42],
    index=['pasan', 'treasure', 'ashley', 'craig']
)
labeled_balances

pasan       20.00
treasure    20.18
ashley       1.05
craig       42.42
dtype: float64

A NumPy array is also iterable, so I can create a new Series from an ndarray. 

In [9]:
ndbalances = np.array([20.00, 20.18, 1.05, 42.42])
pd.Series(ndbalances)

0    20.00
1    20.18
2     1.05
3    42.42
dtype: float64

Finally, I can pass in a scalar and create a pandas datatable from a scalar and an index. If I passs in a scalar, because it is a single value, it will be broadcast to each of the keys specified in the index keyword argument.

In [10]:
pd.Series(20.00, index=["guil", "jay", "james", "ben", "nick"])

guil     20.0
jay      20.0
james    20.0
ben      20.0
nick     20.0
dtype: float64

__Note__ the keys are autogenerated when no index is specified. 

### Exercise Two - Accessing a Series

There are multiple ways for me to access the data. The `series` is indexed by username. The label is the username, so the value is that user's cash balance. A series is ordered and indexable, it is zero-based, so I can access it by index just like I would by a list or an array. 

In [11]:
balances

pasan       20.00
treasure    20.18
ashley       1.05
craig       42.42
dtype: float64

In [14]:
balances[0]

20.0

In [16]:
type(balances[0])

numpy.float64

The value is wrapped in a NumPy.Scalar so that it keeps its data type and will work with other data types and structures.<br>

The same positional indexing works just like a standard list. The indices begin start with 0, and negative numbers can be used to access values from the end of a list. 

In [17]:
# Accessing the last balance
balances[-1]

42.42

#### Accessing by Label
Since a series is labelled, I can access it like I would a dict:

In [19]:
balances['pasan']

20.0

In [20]:
for label, value in balances.items():
    render("The label {} has a value of {}".format(label, value))

NameError: name 'render' is not defined