# Class 6: Introduction to Pandas pt.1

In this Jupyter Notebook you will learn about the basic workings of Pandas Series structures. Please work through this document's Python-3 code cells to experience the power of the Pandas library.

Pandas is a standard data science libaray for Python-3. Pandas is built on top of the Numpy library so working with the various data structures should be easy to pick up quickly. You can read about Pandas (series) more @ the Pandas online docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html

In [None]:
import pandas as pd

___
## Pandas Series

A **series** is a simple Pandas data structure used to store data in a linear fashion. Here are a few details about Pandas series:
* Series are to be of a single data type.
* Series are linear, this means the data has a numerical position.
* Series values have a column which can be either the numerical position (default) or a custom value to represent each piece of data.
* Series can be index in a number of unique ways.
* Series can be modified using standard Python-3 operators (arithmetic & boolean).
* Series have attached attributes associated with them such as name, data-types, or other infomation.
* Series have built-in methods that can quickly process the data points.
* Series are the base structure of Pandas data frames.


___
## Section 1:
### Creating Series

Here are a few ways to create Pandas series:
1. Python-3 dictionary
2. Hard coded data (data with unnamed columns)

#### Python-3 dictionary

Dictionaries are Key:Value pairing data structures. This structure can then be directly implemented into the Pandas Series.

In [None]:
g7_data = {
    "Canada": 35.467,
    "France": 63.951,
    "Germay": 80.940,
    "Italy": 60.665,
    "Japan": 127.061,
    "United Kingdom": 64.511,
    "United States": 318.523
}

G7_pop = pd.Series(g7_data)

# Display pd-series
G7_pop

#### Hard coded data (data with unnamed columns)

In [None]:
G7_pop = pd.Series([35.467, 63.951, 80.940, 60.665, 127.061, 64.511, 318.523])

G7_pop

From the cell above, you can see the pd-series' columns are auto named with 0-n

You can add specific names to the columns using the series' **.index** attribute.

In [None]:
G7_pop.index = [
    'Canada',
    'France',
    'Germany',
    'Italy',
    'Japan',
    'United Kingdom',
    'United States',
]

G7_pop

___
## Section 2:
#### Naming a Series

Sometimes, for organization and data visualization you want the series to be named. This can be completed by using the **.name** attribute

In [None]:
G7_pop.name = 'G7 Population in millions'

# Display the series info (and name)
G7_pop

___
## Section 3:
#### Getting data regarding the series

In [None]:
G7_pop.info

Pd-Series are generally of a single data type. This is because when you go to run some process on the data (for data science) it is important that the data is homogeneous.

In [None]:
G7_pop.dtype

In [None]:
G7_pop.values

In [None]:
G7_pop.index

_Notice_, all the code above can be stored into variables or used as conditional values. As they are seen above, outputting is their only function.

___
## Section 4:
#### Indexing pd-series

**a.** Similar to a Python-3 dictionary, your programs can index the pd-series by column name.

In [None]:
G7_pop['Canada']

In [None]:
G7_pop['Japan']

**b.** Indexing by numerical position using the **.iloc()** method.

In [None]:
# index the first value
G7_pop.iloc[0]

In [None]:
# index the last value
G7_pop.iloc[-1]

**c.** Selecting multiple elements at one time.

In [None]:
G7_pop[["Italy", "France"]]

**d.** Standard indexing (as you would a Python-3 list).

In [None]:
G7_pop[2]

___
## Section 5:
#### Modifying Series

**a.** Changing a value by column name

In [None]:
G7_pop["Italy"] = 61.899

In [None]:
G7_pop

**b.** Add value to specific index

In [None]:
G7_pop['Canada'] -= 20

In [None]:
G7_pop

**c.** Modifying value using **.iloc()** method

In [None]:
G7_pop.iloc[-1] = 500

**d.** modifying sub-slicing series based on conditional indexing

In [None]:
G7_pop[G7_pop < 70] = 90

In [None]:
G7_pop

___
## Section 6:
#### Boolean Series

In [None]:
# reset series
g7_data = {
    "Canada": 35.467,
    "France": 63.951,
    "Germay": 80.940,
    "Italy": 60.665,
    "Japan": 127.061,
    "United Kingdom": 64.511,
    "United States": 318.523
}

G7_pop = pd.Series(g7_data)

**a.** Access all values of a series that are True (based on the conditions)

In [None]:
G7_pop > 80

**b.** Creating sub-series of values that meet a conditional statement

In [None]:
G7_pop[G7_pop > 80]

**c.** Creating sub-series of values that meet compund (2+) conditional statement

In [None]:
G7_pop[(G7_pop > 80) | (G7_pop < 40)]

In [None]:
G7_pop[(G7_pop > 80) & (G7_pop < 200)]