***
# 4.1 Pandas Series

- Pandas Documentation: https://pandas.pydata.org/


***
### Python 4.01. Series
### Python 4.02. Pandas DataFrame, Selection, and Indexing
### Python 4.03. Configuring Options, Data Type Conversion, Working with Strings and Dates, Missing Data
### Python 4.04. Groupby, Categorizing, and Labeling Data
### Python 4.05. Merging, Joining, and Concatenating
### Python 4.06. Pipe, Apply, Applymap, Map, Pivot Table, and Cotingency Table
### Python 4.07. Data Input and Output
### Python 4.08. Data Visualization
### Python 4.09. Exploratory Data Analysis and Beyond
### Python 4.10. Breakout Group Exercise and Solution
***

![image.png](attachment:image.png)

### Pandas Installation: Go to the command line or terminal and using ether:
```Python
conda install panda

pip install pandas
```



## Pandas Introduction

#### Why is it named after an East Asian bear?
  - Created by Wes McKinney in 2008 while working at the hedge fund AQR
  - Smashing together of `Panel Dataset`

  
#### Python already has data structures to handle data, why do we need another one?
  - Built-in data structures (such as lists) are not built for scientific computing
  - Numpy (numerical Python) library has arrays - contain homogenous data
  - Pandas builds more functionality on top of Numpy


#### What is Pandas?
  - You can think of pandas as an `extremely powerful version of Excel, with a lot more features`. 
  - Pandas is an open source library built on top of Numpy with a reputation for an all round one stop shop for data analytics
   - It allows for fast analysis and data preparation: data transformation, data cleaning, data extraction, and data visualization, etc.
   - It provides tools for data manipulation: reshaping, merging, sorting, slicing, aggregation, etc. 
   - Pandas integrates well with the other popular data science libraries such as Numpy, Scikit-learn, Statmodels, Matplotlib, and Seaborn
  - It is nearly self-contained: lots of functionality is built into one package
  - It is the easiest and most intuitive Python package for dealing with data which adds data structures and tools designed to work with table-like data/tabular data: 
       - Two-dimensional data of rows and columns, i.e. a table
       - Many formats for data - XML, JSON, CSV, Parquet, and many others. Pandas can read in nearly all of them returning a DataFrame
  - It allows handling missing data 
  - It excels in performance and productivity


## Table of Contents - Pandas Series
### 1. Series() Function
### 2. Data Types in Series
### 3. Using an Index


The first main data type we will learn about in `Pandas` is the Series data type. A `Pandas Series` is very similar to a `Numpy Array` and it is built on top of the Numpy Array object. What differentiates a Pandas Series from a Numpy Array:

1) A Pandas Series can have axis labels/can be indexed by a **label**, instead of just a number location

2) It also doesn't need to hold numeric data, it can hold **any arbitrary Python object**

In the Numpy package, `array` is the terminology used for vectors, matrices, and higher-dimensional data sets for **the same data types**.

![image.png](attachment:image.png)

In [29]:
# Import Pandas and explore the Series object.
import numpy as np
import pandas as pd

In [30]:
pd.__version__

'1.5.0'

### 1. `Series()` function: creating a Series

We can convert a list,numpy array, or dictionary to a Series:

In [57]:
labels = ['label1','label2','label3','label4']
my_list = [100,200,300,400]
arr = np.array(my_list)
d = {'label1':100,'label2':200,'label3':300,'label4':400}

In [58]:
labels

['label1', 'label2', 'label3', 'label4']

In [59]:
my_list

[100, 200, 300, 400]

In [60]:
arr

array([100, 200, 300, 400])

In [61]:
d

{'label1': 100, 'label2': 200, 'label3': 300, 'label4': 400}

In [62]:
# Convert a list to a Series with index
pd.Series(data=my_list)

0    100
1    200
2    300
3    400
dtype: int64

In [63]:
# Add labels to the Series to have labeled index. Shift+Tab --> see docstring
pd.Series(data=my_list,index=labels)

label1    100
label2    200
label3    300
label4    400
dtype: int64

In [64]:
pd.Series(my_list,labels) # as long as in a correct order

label1    100
label2    200
label3    300
label4    400
dtype: int64

In [65]:
# Convert a Numpy array to a Series
pd.Series(arr)

0    100
1    200
2    300
3    400
dtype: int32

In [66]:
# Add labels to the Series
pd.Series(arr,labels)

label1    100
label2    200
label3    300
label4    400
dtype: int32

#### Converting a Dictionary to a Series

In [67]:
d

{'label1': 100, 'label2': 200, 'label3': 300, 'label4': 400}

In [68]:
pd.Series(d)

label1    100
label2    200
label3    300
label4    400
dtype: int64

### 2. Data Types in Series

A pandas Series can hold a variety of object types:

In [69]:
pd.Series(data=labels) # can be label

0    label1
1    label2
2    label3
3    label4
dtype: object

In [70]:
# It can even hold functions (although unlikely that you will use this - pandas Series is very flexible)
pd.Series([sum,print,len])

0      <built-in function sum>
1    <built-in function print>
2      <built-in function len>
dtype: object

### 3. Using an Index

The key to using a Series is understanding its index. Pandas makes use of these index names or numbers by allowing for fast lookups of information (works like a dictionary).

In [71]:
series1 = pd.Series([1,2,3,4],index = ['Spring', 'Summer','Autumn', 'Winter'])         
series1

Spring    1
Summer    2
Autumn    3
Winter    4
dtype: int64

In [72]:
series2 = pd.Series([1,2,5,4],index = ['Spring', 'Summer','AllSeasons', 'Winter'])       
series2

Spring        1
Summer        2
AllSeasons    5
Winter        4
dtype: int64

In [73]:
series3 = pd.Series(['Spring', 'Summer','AllSeasons', 'Winter'],index =  [1,2,5,4])  
series3

1        Spring
2        Summer
5    AllSeasons
4        Winter
dtype: object

In [74]:
series1['Spring']

1

In [75]:
labels

['label1', 'label2', 'label3', 'label4']

In [76]:
series3 = pd.Series(data = labels)
series3

0    label1
1    label2
2    label3
3    label4
dtype: object

In [77]:
series3[0]  # by index label

'label1'

#### Operations are then also done based off of index:
1) It will try to match out based on label, if not, it will return NaN

(Note: In computing, `NaN` stands for `Not a Number`, is a member of a numeric data type that can be interpreted as a value that is undefined or unrepresentable, especially in floating-point arithmetic).

2) Numpy and Pandas would automatically convert to float in order to retain all the information possibly

In [78]:
series1

Spring    1
Summer    2
Autumn    3
Winter    4
dtype: int64

In [79]:
series2

Spring        1
Summer        2
AllSeasons    5
Winter        4
dtype: int64

In [80]:
series1 + series2

AllSeasons    NaN
Autumn        NaN
Spring        2.0
Summer        4.0
Winter        8.0
dtype: float64

In [81]:
# sum(series1, series2) is different and be careful to use sum() for summation
sum(series1, series2)

Spring        11
Summer        12
AllSeasons    15
Winter        14
dtype: int64

In [82]:
# Exercise: create a Series with index and column names



After learning **Series**, let's move on to **DataFrames**, which will expand on the concept of **Series**.

#### Note: The course materials are developed mainly based on personal experience and contributions from the Python learning community
Referred Books: 
- Learning Python, 5th Edition by Mark Lutz
- Python Data Science Handbook, Jake, VanderPlas
- Python for Data Analysis, Wes McKinney    

Copyright ©2023 Mei Najim. All rights reserved. 