# Pandas

- is Python library used to analyze data
- has functions for analyzing, cleaning, exploring, and manipulating data
- `Pandas` refer to "Python Data Analysis"

***Why Pandas?***
- allows us to analyze big data and make conclusions based on statistical theories.
- can clean messy data sets, and make them readable and relevant.

## <span style="color: #cccc32">Series</span>

- is like a column in a table.
- It is a one-dimensional array holding data of any type.

***Labels (index)***
- If nothing else is specified, the values are labeled with their index number. First value has index 0, second value has index 1 etc.

In [5]:
import pandas as pd

data = pd.Series([0.25, 0.5, 0.75, 1.0])

print(data)
print('---------------------')
print(type(data))
print('---------------------')
print(data.values)
print('---------------------')
print(data.index)
print('---------------------')
print(data.keys)
print('---------------------')

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64
---------------------
<class 'pandas.core.series.Series'>
---------------------
[0.25 0.5  0.75 1.  ]
---------------------
RangeIndex(start=0, stop=4, step=1)
---------------------
<bound method Series.keys of 0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64>
---------------------


<span style="color:#41bc66">**describe()**</span>

- method returns description of the data in the DataFrame or Series.

```python
count - The number of not-empty values.
mean - The average (mean) value.
std - The standard deviation.
min - the minimum value.
25% - The 25% percentile*.
50% - The 50% percentile*.
75% - The 75% percentile*.
max - the maximum value.
```

In [7]:
import pandas as pd

data_list = [3, 6, 9, 8, 5, 4, 2, 6, 3, 5, 8]

data = pd.Series(data_list)

data_description = data.describe()

print(type(data_description))
print('---------------------')
print(data_description)
print('---------------------')

<class 'pandas.core.series.Series'>
---------------------
count    11.000000
mean      5.363636
std       2.292280
min       2.000000
25%       3.500000
50%       5.000000
75%       7.000000
max       9.000000
dtype: float64
---------------------


<span style="color:#41bc66">**agg()**</span>

- the same as describe but it doesn't show all description just the information passed for it

In [9]:
import pandas as pd

data_list = [3, 6, 9, 8, 5, 4, 2, 6, 3, 5, 8]

data = pd.Series(data_list)

data_description = data.agg(['min', 'max', 'mean', 'std'])

print(data_description)
print('-------------------------')

min     2.000000
max     9.000000
mean    5.363636
std     2.292280
dtype: float64
-------------------------


<span style="color:#41bc66">**Accessing & Slicing**</span>

- here we can access data with index and also slice it as `lists`

In [10]:
import pandas as pd

data_list = [3, 6, 9, 8, 5, 4, 2, 6, 3, 5, 8]

print(data[1:3])
print('----------------------')
print(data[1:6:2])
print('----------------------')
print(data[5])
print('----------------------')

1    6
2    9
dtype: int64
----------------------
1    6
3    8
5    4
dtype: int64
----------------------
4
----------------------


<span style="color:#41bc66">**create index "labels"**</span>

- we can set indexs as we want not just use default indices

In [14]:
import pandas as pd

data_list = [3, 6, 9, 8]
indexs = ['a', 'b', 'c', 'd']
new_data_list = dict(zip(indexs, data_list))

data1 = pd.Series(data_list, index={'a':3, 'b':6, 'c':9, 'd':8})
data2 = pd.Series(data_list, index=indexs) 
data3 = pd.Series(new_data_list)
data4 = pd.Series({'a':3, 'b':6, 'c':9, 'd':8})

print(data1)
print('------------------------')
print(data2)
print('------------------------')
print(data3)
print('------------------------')
print(data4)
print('------------------------')

a    3
b    6
c    9
d    8
dtype: int64
------------------------
a    3
b    6
c    9
d    8
dtype: int64
------------------------
a    3
b    6
c    9
d    8
dtype: int64
------------------------
a    3
b    6
c    9
d    8
dtype: int64
------------------------


<span style="color:#41bc66">**Operators (`&, |, ^`)**</span>

- we use them to filter data
    - `&`: to get intersection between 2 lists
    - `|`: union
    - `^`: values not repeated in just one of lists and not in another

In [20]:
import pandas as pd

a = pd.Index([0, 1, 3, 5, 7, 9])
b = pd.Index([2, 3, 5, 6, 9])

print(a)
print('--------------------------')
print(b)
print('--------------------------')
print(a&b)              #intersection
print('--------------------------')
print(a|b)              #union
print('--------------------------')
print(a^b)      #means get numbers not repeated ('just in a or b not in both')
print('--------------------------')

Int64Index([0, 1, 3, 5, 7, 9], dtype='int64')
--------------------------
Int64Index([2, 3, 5, 6, 9], dtype='int64')
--------------------------
Int64Index([3, 5, 9], dtype='int64')
--------------------------
Int64Index([0, 1, 2, 3, 5, 6, 7, 9], dtype='int64')
--------------------------
Int64Index([0, 1, 2, 6, 7], dtype='int64')
--------------------------


  print(a&b)              #intersection
  print(a|b)              #union
  print(a^b)      #means get numbers not repeated ('just in a or b not in both')
