# Getting Started - Pandas Series

* References:
  * https://pandas.pydata.org/docs/user_guide/10min.html
  * https://pandas.pydata.org/docs/user_guide/dsintro.html

We will begin by introducing the `Series`, `DataFrame`, and `Index` classes, which are the basic building blocks of the pandas library, and showing how to work with them. By the end of this section, you will be able to create DataFrames and perform operations on them to inspect and filter the data.

## Intro to data structures

We’ll start with a quick, non-comprehensive overview of the fundamental data structures in pandas to get you started. The fundamental behavior about data types, indexing, and axis labeling / alignment apply across all of the objects. 

* pandas series
* pandas DataFrame

How to create a pandas series or a dataframe from an existing Python data structure:

* a numpy ndarray
* a list of lists
* a list of Namedtuples
* a list of Dictionaries

To get started, import NumPy and load pandas into your namespace:

In [64]:
import numpy as np
import pandas as pd

## 1. Create a Pandas Series

Series is a one-dimensional labeled array capable of holding any data type 
(integers, strings, floating point numbers, Python objects, etc.). 
The axis labels are collectively referred to as the index. The basic method to create a Series is to call:
    
> s = pd.Series(data, index=index)

Here, data can be many different things:

* a Python dict

* an ndarray

* a scalar value (like 5)

The passed index is a list of axis labels. Thus, this separates into a few cases depending on what data is:
    

### 1.1 Create a series from ndarray

If data is an ndarray, index must be the same length as data. If no index is passed, one will be created having values [0, ..., len(data) - 1].

In [65]:
# Create a pandas series from a numpy 5x1 ndarray using a standard norm random.randn()
s = pd.Series(np.random.randn(5))
s              

0   -1.823149
1   -1.416792
2   -0.098841
3   -0.071691
4   -0.361604
dtype: float64

In [66]:
#Create a series with a named index
s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])
s

a    1.957541
b   -0.680687
c   -0.606930
d    0.094761
e    1.603763
dtype: float64

### 1.2 Create a series from a list


In [67]:
# Create a pandas series a list using range()
s = pd.Series(range(100,105))
s    

0    100
1    101
2    102
3    103
4    104
dtype: int64

In [68]:
# Create a pandas series a list of mixed data types
s = pd.Series(['a','b','c','d',1, 2, 3.50])
s    

0      a
1      b
2      c
3      d
4      1
5      2
6    3.5
dtype: object

In [69]:
# Create 26 lower case alphabet letters using a list comprehension
alphabet = pd.Series([chr(ord('a') + i) for i in range(26)], index=range(1, 27))
alphabet 

1     a
2     b
3     c
4     d
5     e
6     f
7     g
8     h
9     i
10    j
11    k
12    l
13    m
14    n
15    o
16    p
17    q
18    r
19    s
20    t
21    u
22    v
23    w
24    x
25    y
26    z
dtype: object

### 1.3 Create a series from dict

Series can be instantiated from dicts:

In [70]:
d = {"b": 1, "a": 0, "c": 2}
pd.Series(d)

b    1
a    0
c    2
dtype: int64

<b> Note</b>: 
> When the data is a dict, and an index is not passed, the Series index will be ordered by the dict’s insertion order

### 1.4 Create a series from a scalar value

In [71]:
# Create a 5-element series with each element value of 4.0
s2 = pd.Series(4.0, index=range(5))
s2

0    4.0
1    4.0
2    4.0
3    4.0
4    4.0
dtype: float64

In [76]:
# With a named index
s = pd.Series(4.0, index=["a", "b", "c", "d", "e"])
s

a    4.0
b    4.0
c    4.0
d    4.0
e    4.0
dtype: float64

### 2. Selecting and Slicing
Selecting one element or a slice of elements from a pandas series is similar to slecting and slicing a Python list and a Python dict:

In [77]:
# Using an index value
s[0]

4.0

In [78]:
# Using an index name like a dictionary
s['a']

4.0

In [79]:
# First 3 elements 0,1,2
s[:3]

a    4.0
b    4.0
c    4.0
dtype: float64

In [80]:
# last element
s[-1]

4.0

In [81]:
# all elements
s[:]

a    4.0
b    4.0
c    4.0
d    4.0
e    4.0
dtype: float64

In [82]:
# Reversed order
s[::-1]

e    4.0
d    4.0
c    4.0
b    4.0
a    4.0
dtype: float64

## 3. Series Data Type - dtype()
Like a NumPy array, a pandas Series has a dtype.

For the most part, pandas uses NumPy arrays and dtypes for Series or individual columns of a DataFrame. Here is a list of Pandas dtypes:

* 'O' - object:  for text or mixed numeric and string values
* 'float64' - float64: floating point number, same as Python type of float
* 'int64' - int64: integer numbers, same as Python type of int
* 'bool' - bool: True/False values, same as Python type of bool
* '<M8[ns]' - datetime64: Date and time values
* '<m8[ns]' - timedelta64: differences between two datetimes
* category: finite list of text values



pandas and third-party libraries extend NumPy’s type system in a few places. This section describes the extensions pandas has made internally. See Extension types for how to write your own extension that works with pandas. See Extension data types for a list of third-party libraries that have implemented an extension.

The following table lists all of pandas extension types. For methods requiring dtype arguments, strings can be specified as indicated. See the respective documentation sections for more on each type.

https://pandas.pydata.org/docs/user_guide/basics.html#basics-dtypes

In [83]:
s = pd.Series(['a','b','c','d',1, 2, 3.50])
s.dtype

dtype('O')

In [84]:
s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])
s.dtype

dtype('float64')

In [85]:
import datetime
today = datetime.datetime.now()
daydiff = [datetime.timedelta(days=i) for i in range(7)]
lastweek=[today - datetime.timedelta(days=i) for i in range(7)]
lastweek

[datetime.datetime(2022, 4, 3, 22, 42, 17, 920601),
 datetime.datetime(2022, 4, 2, 22, 42, 17, 920601),
 datetime.datetime(2022, 4, 1, 22, 42, 17, 920601),
 datetime.datetime(2022, 3, 31, 22, 42, 17, 920601),
 datetime.datetime(2022, 3, 30, 22, 42, 17, 920601),
 datetime.datetime(2022, 3, 29, 22, 42, 17, 920601),
 datetime.datetime(2022, 3, 28, 22, 42, 17, 920601)]

In [86]:
s = pd.Series(lastweek)
s.dtype

dtype('<M8[ns]')

In [87]:
s = pd.Series(daydiff)
s.dtype 

dtype('<m8[ns]')

In [88]:
t = pd.CategoricalDtype(categories=['b', 'a'], ordered=True)
s = pd.Series(['a', 'b', 'a', 'c'], dtype=t)
s.dtype

CategoricalDtype(categories=['b', 'a'], ordered=True)

In [89]:
boolist = [True if i > 3 else False for i in range(5)]
s = pd.Series(boolist)
s.dtype

dtype('bool')

## 4. Apply Numpy functions to a Series

Series acts very similarly to a ndarray, and is a valid argument to most NumPy functions. However, operations such as slicing will also slice the index.

In [90]:
s = pd.Series(np.random.randn(5))
max = np.max(s)
min = np.min(s)              
sum = np.sum(s)
mean = np.mean(s)
std = np.std(s)
cumsum = np.cumsum(s)
exp = np.exp(s)
sqrt = np.sqrt(abs(s))
print(max, min,sum, mean, std)
print("cumsum():")
print(cumsum)
print("exp():")
print(exp)
print("sqrt(abs())")
print(sqrt)

2.091985513185935 -1.6127533657641944 3.977740181035993 0.7955480362071986 1.2939159588772036
cumsum():
0    1.642270
1    3.734255
2    4.399373
3    2.786620
4    3.977740
dtype: float64
exp():
0    5.166883
1    8.100984
2    1.944720
3    0.199338
4    3.290766
dtype: float64
sqrt(abs())
0    1.281511
1    1.446370
2    0.815548
3    1.269942
4    1.091385
dtype: float64


In [91]:
# Of course we could use pd series functions directly
max = s.max()
min = s.min()
sum = s.sum()
mean = s.mean()
std = s.std()
cumsum = s.cumsum()
print(max, min,sum, mean, std)
print("cumsum():")
print(cumsum)

2.091985513185935 -1.6127533657641944 3.977740181035993 0.7955480362071986 1.446642020610625
cumsum():
0    1.642270
1    3.734255
2    4.399373
3    2.786620
4    3.977740
dtype: float64
