<center>
  <a href="2.1-intro-to-python-data-analytics.ipynb">Previous Page</a> | <a href="./">Content Page</a> | <a href="2.3.Intro-to-pandas_Dataframes.ipynb">Next Page</a></center>
</center>

# 2.2 Introduction to pandas Data Structures (Series)

Data Structures are how we store data in pandas. Two workhorse data structures are *Series* and *Data Frames*.  

Here, we are going to examine Series.

##  2.2.1 Series

Series are simply sequences. In official terms, they are "one-dimensional array-like objects". Each entry also has an index number, which makes it easy to look-up entries in the Series, especially when we are dealing with more than one Series  

In [21]:
# Example of a series 
from pandas import Series, DataFrame
import pandas as pd

series = Series([4,7,-8,0])
series 

0    4
1    7
2   -8
3    0
dtype: int64

In [22]:
# access values in a series 
series.values 

array([ 4,  7, -8,  0])

In [24]:
# access index of series 
series.index

RangeIndex(start=0, stop=4, step=1)

In [25]:
series.values

array([ 4,  7, -8,  0])

In [26]:
series[2]

-8

#### Indexing in Series

In [27]:
# custom set our index
series2 = Series([4,5,6,7], index=['d','b','a','c'])
series2

d    4
b    5
a    6
c    7
dtype: int64

In [28]:
# access values within a series using the index 
series2['a']

6

In [29]:
series2['b']

5

#### Accesing Multiple Value

In [30]:
series2[['a','b','c']]

a    6
b    5
c    7
dtype: int64

`Python dictionaries` are data structures where their index does not have to be a number. Instead, their index can also be `a string of characters`. We can **convert it** into a Series and **preserve** the indexing  

In [31]:
sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
#sdata is a dictionary

In [32]:
obj3 = Series(sdata)
obj3

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

Converting a dictionary into a Series `preserves the dictionary's key-value mapping`, so things don't get mixed up (`even` we mixed up the order)

In [33]:
# values are mapped to the dictionary's keys, and California gets encoded as Na
states = ['California', 'Oregon', 'Ohio', 'Texas']
# Ohio is still 35000.0
# but Utah is not mapped in states
obj4 = Series(sdata, index=states)
obj4

California        NaN
Oregon        16000.0
Ohio          35000.0
Texas         71000.0
dtype: float64

In [34]:
#no Utah
obj4.index

Index(['California', 'Oregon', 'Ohio', 'Texas'], dtype='object')

In [35]:
# detect missing data in a data structure 
pd.isnull(obj4)

California     True
Oregon        False
Ohio          False
Texas         False
dtype: bool

In [36]:
pd.notnull(obj4)

California    False
Oregon         True
Ohio           True
Texas          True
dtype: bool

### Exercise 2.2.1a: Series

1. Create a movie rating dictionary. 
2. Convert the dictionary into a series
3. Access a value within the series 

##### Movie rating (Romance)
Assign `movierating_romance` as a variable containing Series as follows:
```
Forrest Gum is 5
Proposal is 3
Notebook is 3
Mr and Mrs Smith is 4
True Lies 5
```

In [37]:
movierating_romance={'Forrest Gum': 5, '___': _, '___': _, '___': _, '___': _}

##### Convert into a Series name it as "Smovierating_romance"

In [38]:
from pandas import Series, DataFrame
import pandas as pd
Smovierating_romance = ___(____)

NameError: name '____' is not defined

In [39]:
Smovierating_romance

NameError: name 'Smovierating_romance' is not defined

#####  Access a value within the series 
We want to check the value of Forrest Gum

In [40]:
___[____]

NameError: name '____' is not defined

In [41]:
Smovierating_romance

NameError: name 'Smovierating_romance' is not defined

### Arithmetic Operations in Series

Another important feature of series is that it aligns differently indexed data when we do arithmetic operations on our series 

* Refresh our memory on obj3
* and obj4
* do summation

In [17]:
obj3 

Ohio      35000
Texas     71000
Oregon    16000
Utah       5000
dtype: int64

In [19]:
obj4

California        NaN
Oregon        16000.0
Ohio          35000.0
Texas         71000.0
dtype: float64

In [20]:
obj3 + obj4

California         NaN
Ohio           70000.0
Oregon         32000.0
Texas         142000.0
Utah               NaN
dtype: float64

#### Any calculation with NaN value will return NaN (see an example).  It's good to clean the NaN before we do analysis in Data Cleaning

In [21]:
b=4.55
import numpy as np
a=np.nan
a

nan

In [22]:
a+b

nan

### Exercise 2.2.1b: Movie Action
##### Movie rating (Action)
* Assign `movierating_action` as a variable containing Series as follows:

```
True Lies is 4.5
Mr and Mrs Smith is 3.5
Police Story is 5
Taken is 4
```
* Convert into a Series name it as "Smovierating_action"
* Access the movie Name "Taken"

### Exercise 2.2.1c: Arithmetic Operations in Series

#### Averaging Movies 
* (Smovierating_action + Smovierating_romance) /2

<center>
  <a href="2.1-intro-to-python-data-analytics.ipynb">Previous Page</a> | <a href="./">Content Page</a> | <a href="2.3.Intro-to-pandas_Dataframes.ipynb">Next Page</a></center>
</center>

#### More detailed information: 
https://pandas.pydata.org/pandas-docs/stable/dsintro.html<br>
https://discuss.analyticsvidhya.com/t/what-is-the-difference-between-pandas-series-and-python-lists/27373/2
https://stackoverflow.com/questions/26047209/what-is-the-difference-between-a-pandas-series-and-a-single-column-dataframe





### Possible Solution:

### Exercise 2.2.1a: Series




In [1]:
movierating_romance={'Forrest Gum': 5, 'Proposal': 3,
                     'Notebook': 3, 'Mr and Mrs Smith': 4,
                     'True Lies': 5}

from pandas import Series, DataFrame
import pandas as pd

In [2]:
Smovierating_romance =Series(movierating_romance)

In [3]:
Smovierating_romance

Forrest Gum         5
Proposal            3
Notebook            3
Mr and Mrs Smith    4
True Lies           5
dtype: int64

In [4]:
Smovierating_romance['Forrest Gum']

5


### Exercise 2.2.1b: Series



In [5]:
movierating_action={'True Lies': 4.5,
                    'Mr and Mrs Smith': 3.5, 
                    'Police Story': 5,
                    'Taken': 4}

In [6]:
movierating_action['Taken']

4

In [7]:
Smovierating_action=Series(movierating_action)

In [8]:
Smovierating_action

True Lies           4.5
Mr and Mrs Smith    3.5
Police Story        5.0
Taken               4.0
dtype: float64


### Exercise 2.2.1c: Series



In [9]:
(Smovierating_action + Smovierating_romance) /2

Forrest Gum          NaN
Mr and Mrs Smith    3.75
Notebook             NaN
Police Story         NaN
Proposal             NaN
Taken                NaN
True Lies           4.75
dtype: float64