# pandas Data Structures:
**Series** and **DataFrame** are **two workhorse** data structures in pandas.<br> Lets talk about series first:
## Series: 
Series is a one-dimensional array-like object, which contains values and an array of labels, associated with the values. Series can be indexed using labels. <br>
*(Series is similar to NumPy array -- actually, it is built on top of the NumPy array object)
<br>Series can hold any arbitrary Python object.*

Let's get hands-on and learn the concepts of Series with examples:

In [1]:
# first thing first, we need to import NumPy and pandas
# np and pd are alias for NumPy and pandas
# pandas Documentation using pd.<TAB> and pd?
import numpy as np
import pandas as pd

**We can create a Series using list, numpy array, or dictionary**<br>
Let's create these objects and convert them into panda's Series!

**Series using lists**<br>
Lets create a Python list containing labels and another with data.

In [2]:
my_labels = ['x','y','z'] 
my_data = [100,200,300] 

So, we have two Python’s list objects,<br>

* `my_labels` - a list of strings, and
* `my_data` - a list of numbers

We can use **`pd.Series`** (with capital S) to convert the Python’s list object to pandas Series. <br>

&#9758; *If you press `<Shift+tab>`, you see Series takes a wide variety of parameters, at the moment we will focus on the data and the index. lets consider data only and see how it works!*

In [3]:
# Converting my_data (Python list) to Series (pandas series)
pd.Series(data = my_data)
# shift+tab, we will focus on data and index at the moment

0    100
1    200
2    300
dtype: int64

Column "0 1 2" is automatically generated index for the elements in series with data 100 200 and 300. We can specify index values and call the data points using these indexes.<br> 
Let's pass "my_labels" to the Series as index.   

In [4]:
pd.Series(data = my_data, index = my_labels)
#pd.Series(my_data, my_labels) # Becasue data and index are in order (shit+TAB)

x    100
y    200
z    300
dtype: int64

**Series using NumPy arrays **

In [5]:
# Lets create NumPy array from my_data and then Series from that array
my_array = np.array(my_data)
pd.Series(data = my_array)

0    100
1    200
2    300
dtype: int64

In [6]:
pd.Series(data = my_array, index = my_labels)
# pd.Series(my_array, my_labels) # data and index are in order

x    100
y    200
z    300
dtype: int64

**Series using dictionary**

In [7]:
# Let's create a dictionary my_dic
my_dic = {'x':100,'y':200,'z':300}
pd.Series(my_dic)

x    100
y    200
z    300
dtype: int64

***Notice the difference here,<br> if we pass a `dictionary` to `Series`, pandas will take the keys as `index/labels` and `values` as data.***

**Series can hold a wide variety of objects types, lets see with examples:**

In [8]:
# let's pass my_labels (which is a list of strings) as data
pd.Series(data = my_labels)

0    x
1    y
2    z
dtype: object

In [9]:
# We can pass a list of buit-in functions!
pd.Series([min, max, sum, print])
# This is jsut an example, you may not see this in the real world!

0      <built-in function min>
1      <built-in function max>
2      <built-in function sum>
3    <built-in function print>
dtype: object

### Grabbing data from Series:
Indexes are the key thing to understand in Series. Pandas use these indexes (numbers or names) for fast information retrieval. (Index works just like a hash table or a dictionary).

To understand the concepts, Let's create three Series, `ser1`, `ser2`, `ser3` from dictionaries with some random data:

In [10]:
# Creating dictionaries
dic_1 = {'Toronto': 500, 'Calgary': 200, 'Vancouver': 300, 'Montreal': 700}
dic_2 = {'Calgary': 200, 'Vancouver': 300, 'Montreal': 700}
dic_3 = {'Calgary': 200, 'Vancouver': 300, 'Montreal': 700, 'Jasper':1000}

In [11]:
# Creating pandas series from the dictionaries
ser1 = pd.Series(dic_1)
ser2 = pd.Series(dic_2)
ser3 = pd.Series(dic_3)

In [12]:
# Grabbing information for series is very much similar to dictionary.
ser1['Calgary'] # its case sensitive "calgary" is not the same as "Calgary"

200

Note, we are passing a string "Calgary" our index contains strings (name of the cities). If the index is a number, we will pass in the number.

**Just a quick note:**<br> 
*When we are only passing a dictionary, the index in the resulting Series will have the dictionary's keys in sorted order (in this case, C, M, T, V are sorted). We can override this by passing the dictionary keys in the order we want them to appear in the resulting Series.* "output series to observe"

In [13]:
ser1 # C M T V are in order

Calgary      200
Montreal     700
Toronto      500
Vancouver    300
dtype: int64

In [14]:
ser2 # C, M, V are in order

Calgary      200
Montreal     700
Vancouver    300
dtype: int64

<b>Basic operations on series are usually based on the index.</b> <br>
For example, if we want to add ser1 + ser2, it tries to match up the operation based on the index. For Calgary, Montreal and Vancouver, it adds the values whereas for Toronto, it can not find a match and put NaN there.

In [15]:
ser4 = ser1 + ser2 
ser4

Calgary       400.0
Montreal     1400.0
Toronto         NaN
Vancouver     600.0
dtype: float64

In [16]:
# Let's look at ser3!
ser3 # C J M V are in order

Calgary       200
Jasper       1000
Montreal      700
Vancouver     300
dtype: int64

In [17]:
ser5 = ser4 + ser3
ser5

Calgary       600.0
Jasper          NaN
Montreal     2100.0
Toronto         NaN
Vancouver     900.0
dtype: float64

Notice that the values found in the series were added for their appropriate index, on the other hand, if there is no match, the value appears as NaN (not a number) which is considered in pandas to **mark missing or NA values**.<br>

## Good to know!

### `isnull()`, `notnull()` 
* detect missing data

In [18]:
#pd.isnull(ser4)
ser4.isnull()
# shift+tab, its Type is method

Calgary      False
Montreal     False
Toronto       True
Vancouver    False
dtype: bool

In [19]:
#pd.notnull(ser5)
ser5.notnull()

Calgary       True
Jasper       False
Montreal      True
Toronto      False
Vancouver     True
dtype: bool

### `axes`, `values` 
* `axes`: returns list of the row axis labels/index
* `values`: returns list of values/data<br>

Let's try `axes` and `values` on our series!

In [20]:
# row axis labels (index) list can be obtained
ser1.axes
#<shift+tab> axes type is property, its attribute!

[Index(['Calgary', 'Montreal', 'Toronto', 'Vancouver'], dtype='object')]

In [21]:
# returns the values/data
ser1.values

array([200, 700, 500, 300])

### `head()`, `tail()`
To view a small sample of a Series or DataFrame (we will learn `DataFrame` in the next lecture) object, use the **`head()`** and **`tail()`** methods. <br>
The default number of elements to display is **five**, but you may pass a custom number.

In [22]:
ser1.head(1)

Calgary    200
dtype: int64

In [23]:
ser1.tail(1)

Vancouver    300
dtype: int64

### `size`
* To check the number of elements in your data.

In [24]:
ser1.size

4

### `empty` 
* True if the series in empty

In [25]:
# True for empty series
ser1.empty

False

# Excellent!
### Let's have a quick overview on what we have learned and move on the DataFrames to expand our concepts of Series.
# Good Luck