___

<a href='https://www.youtube.com/FallinPython'> <img src="../_images/FallinPython_Jupyter-01.jpg" width="750" height="400" align="center"/></a>
___

In [None]:
import pandas as pd
import numpy as np

# Pandas Library
* Website:       https://pandas.pydata.org/ 
* Install Pandas:       https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html
* Documentation: https://pandas.pydata.org/docs/
* User Guide: https://pandas.pydata.org/docs/user_guide/index.html#user-guide

___
# Content

[Series Data Structure Definition](#section_Series)<br>
[How to Create Series](#section_createSeries)<br>
[Basic Operations on Series](#section_BasicOperation)<br>
[Arithmetic Operations on Series](#section_arithmeticOperation)<br>
[Series Methods and Attributes](#section_methodsAtributtes)<br>
___

<a id='section_Series'></a>

# 1. Series

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). <br>
A Pandas Series behaves a bit like a `numpy array` and a bit like a `python dictionary`.
* Series as NumPy array: Pandas is built on top of NumPy for high performance array computing. We can get and set values from a Series by index.
* Series as Python Dictionary: You are able to get and set values by index label.

<img src="../_images/pandas_data_structures.png" width="1000" height="400" align="center" /><br>
___

<a id='section_createSeries'></a>

## 1.1 Creating Series

There are several ways to create a pandas `Series`, however <u>it's not very common</u> to create a `Series` manually when working with Pandas. The most common way is that you pipe the data into Pandas and from this data you will manipulate it ending up with either a `DataFrame` or a `Series`.

<img src="../_images/common_usage_pandas.png" width="750" height="400" align="center" />

The basic method to create a Series is to call the `Series` method from Pandas:

```python
pd.Series(
    data=None,
    index=None,
    dtype=None,
    name=None,
    copy=False,
    fastpath=False,
)
```

In [None]:
my_series = pd.Series([100,200,300], index=["Monday", "Tuesday","Thrusday"],name="Sales")
my_series

We can provide different data types to the data parameter to create a pandas `Series`:
* List or tupple
* Numpy Array
* Scalar value
* Dictionary

**Creating Series from List**

In [None]:
data = [10, 20, 30, 40, 50]
my_series = pd.Series(data, index=list('abcde'))  # , index=[x for x in 'ABCDE']  or index=list('ABCDE')
print(my_series)

**Creating Series from a NumPy Array**

In [None]:
data = np.arange(10,60,10)
my_series = pd.Series(data)
print(my_series)

**Creating Series from a Scalar Value**

In [None]:
index_list = ["a", "b", "c", "d", "e"]
my_series = pd.Series(13, index=index_list)   # , name="scalar value" (numpy broadcasting)
print(my_series)

**Creating Series from a Dictionary**

In [None]:
data = {"D": 10, "B": 20, "C": 30, "A": 40, "E": 50}
my_series = pd.Series(data,name="Values")
print(my_series)

<a id='section_BasicOperation'></a>

## 1.2 Basic Operations on Series 

### Accessing / setting elements

In [None]:
my_dict = {"a": 10, "b": 20, "c": 30, "d": 40, "e": 50}
my_series = pd.Series(my_dict)
print(my_series)

In [None]:
# retrieving a single element => using index
my_series[0]

In [None]:
# retrieving a single element => using index label
my_series["a"]

In [None]:
# retrieving n-elements => using index
my_series[[0,3,-1]]

In [None]:
# retrieving n-elements => using index label
my_series[["a","d","e"]]

### Slicing pandas series

In [None]:
my_series

In [None]:
# slicing using index => (numpy-like)
my_series[0:3]
my_series

In [None]:
# slicing using index label => (dicitonary-like)
my_series["a":"c"]
my_series

**Important**:<br>
There is a better way to select elements of a pandas `Series` and we will see them in the **Series Methods abd Attributes** session. The good thing to learn it now is that we will also use it with pandas `DataFrame`. I am talking about the `.loc` and `.iloc` methods!

### Boolean Filter

In [None]:
my_array = np.arange(10,55,5)
my_series = pd.Series(my_array)
print(my_series)

In [None]:
# single condition
filter_ = (my_series > 30)
my_series[filter_]

In [None]:
# multiple conditions with and operator
filter_ = (my_series > 20) & (my_series <= 40)
my_series[filter_]

In [None]:
# multiple condition with or operator
filter_ = (my_series < 20) | (my_series >= 40)
my_series[filter_]

<a id='section_arithmeticOperation'></a>

## 1.3 Arithmetic operations on Series 

Similar to a NumPy array, you can perform arithmetic operations on pandas `Series` and even between `Series`

### Arithmetic Operation on a single Series

In [None]:
series_1 = pd.Series([10,20,30,40,50])
series_1

In [None]:
series_1**5

### Arithmetic Operation between Series

Arithmetic operation happens element-wise as Pandas is built on top of NumPy.
The detail you need to pay attention is to check if the series have the same index:

<img src="../_images/series_arithmetic_same_index.png" width="550" height="400" align="left" />

In [None]:
series_1 = pd.Series([10,20,30,40,50])
series_2 = pd.Series([30,70,100,120,150])

In [None]:
series_1*series_2

<img src="../_images/series_arithmetic_different_index.png" width="550" height="400" align="left" />

In [None]:
series_1 = pd.Series([10,20,30,40,50])
series_2 = pd.Series([30,70,100])

In [None]:
series_1+series_2

**Important**:<br>
If you try to make arithmetic opration between series that has different index, you will end up with `NaN` (Not a Number) where there is no index correspondence between the `Series`.

<a id='section_methodsAtributtes'></a>

## 1.4 Series Methods and Attributes - (Good to know)

* Chech them out in the documentation: https://pandas.pydata.org/pandas-docs/stable/reference/series.html

Pandas `Series` has plenty of methods and attributes and I will not go through all of them, however I will point out some useful ones that will help us during this course. You can check the complete list using the link from the online documentation above or using the python buil-in function `dir`.

In [None]:
dir(pd.Series([1,2,3]))

In [None]:
len([x for x in dir(pd.Series([1,2,3])) if not x.startswith("__")])

**How to get the index?**

In [None]:
my_dict = {"a": 10, "b": 20, "c": 30, "d": 40, "e": 50}
my_series = pd.Series(my_dict)
print(my_series)

In [None]:
my_series.index.to_list()

In [None]:
# how to rename the index (all at once)
my_series.index = ["Brazil","Germany","France","Belgium","Marocco"]
my_series

In [None]:
# how to rename a single or some indices
#my_series.index[0] = 'England'  # it does not work
my_series.rename({"Brazil":"England", "Belgium":"USA"}, inplace=True)
my_series

**How to get and set values?**

In [None]:
# accessing the values from a series
my_series.to_numpy()

In [None]:
# you can set values to a Series
my_series[0] = 1000
my_series["France"] = 500
my_series

**Methods: `loc()` and `iloc()`**

In [None]:
# creating a series from a dictionary
my_dict = {"Brazil": 6, "Germany": 7, "France": 8, "Belgium": 9, "Marocco": 10}
series_1 = pd.Series(my_dict)
series_1

In [None]:
# .iloc selects elements by index
series_1.iloc[[0,2,-1]]

In [None]:
# .loc selects elements by index label
series_1.loc[["Brazil","France","Marocco"]]  

In [None]:
# possible to combine .loc with boolean filter
series_1.loc[series_1 > 7]

In [None]:
# setting values using loc and iloc methods
series_1.loc[series_1 > 7] = 10
series_1

**Methods: `count()` and `value_counts()`**

In [None]:
import random

#let's create a series with n Elements
n = 500
countries = ["Brazil", "Germany", "France", "Belgium", "Marocco"]
list_countries = [random.choice(countries) for count in range(n)]
series_1 = pd.Series(list_countries)
series_1

In [None]:
# method: count()
# Question: How many entries are there?
series_1.count()

In [None]:
# method: value_counts()
# Question: How many times each country appears?
series_1.value_counts(normalize=False)

**Methods: `nunique()` and `unique()`**

In [None]:
series_1

In [None]:
# method: nunique()
# Question: How many unique countries are there?
series_1.nunique()

In [None]:
# method: unique()
# Question: What are the unique countries?
series_1.unique()

**Method: `describe()`**

In [None]:
# this method works for both: categorical and numerical data
series_1.describe()

**Method: `min()`, `max()`, `sum()`, `cumsum()`, `mean()`**

In [None]:
my_dict = {"a": 10, "b": 20, "c": 30, "d": 40, "e": 50}
my_series = pd.Series(my_dict)
print(my_series)

In [None]:
# methods: idxmin(), idxmax()
# Question: What is the maximum value and its indice?
print(my_series.min())
print(my_series.idxmin())

In [None]:
# method: min(), max(), sum(), mean()
print(my_series.min())
print(my_series.max())
print(my_series.mean())
print(my_series.sum())

In [None]:
my_series

In [None]:
# method: cumsum()
my_series.cumsum()

There are much more `Series` methods and we will get to know more of them during this course.