# Pandas: Vectors and Matrices

<hr/>

## Native Data Structures in Python


Type | Collection | Syntax | Ordered | Indexed | Mutable | Passed By | Duplicates Allowed 
:------------: | :-------------:|:-------------:|:-------------:|:-------------:| :-------------:| :-------------:|:-------------:
`strings` | characters | `"c1c2c3"` | &check; |&check; | &cross; | value | &check;
`list` | any data type | `[v1, v2.. vn]` | &check; |&check; |  &check; | reference | &check;
`tuple` | any data type | `(v1, v2.. vn)` | &check; | &check; |  &cross; | value | &check;
`set` | immutable types | `{v1, v2.. vn}` | &cross; | &cross; |  &cross; | value | &cross;
`dictionaries` | any data type * | `{k1:v1,k2:v2.. kn:vn}` | &cross; | &check; |  &check; | reference | &cross;\**

\* _**keys** can only be immutable type; **values** can be any data type_ \
\** _**keys** can not be duplicate; **values** can be duplicate_

<br/>
<hr/>
<br/>

* Four basic operations **(CRUD)**: 

1. **Create** (aka initialization)
2. **Read** (aka get element)
3. **Update** (aka append, insert, change, push)
4. **Delete** (aka remove, pop)



Type | Create | Read | Update | Delete | 
:---: | :---:|:---:|:---:|:----:| 
`strings` | `foo = "c1c2c3"` | `foo[idx]` |  **N/A**<br/>_`foo.replace(val1, val2)`_ | <span style="{color:red}">**N/A**</span><br/>_`foo.replace(val, "")`_
`list` | `foo = [v1, v2.. vn]`  | `foo[idx]` | `foo[idx] = val` |`foo.remove(val)` or `foo.pop(idx)`
`tuple` | `foo = (v1, v2.. vn)` | `foo[idx]` | **N/A**<br/>_Convert to list_ | **N/A**<br/>_Convert to list_ |  &cross; | value | &check;
`set` |  `foo = {v1, v2.. vn}` | **N/A**<br/>_`val in foo`_ | **N/A** | `foo.pop()` or `foo.remove()` | 
`dictionaries` | `foo = {k1:v1,.. kn:vn}` | `foo[key]` | `foo[key] = val` | `foo.pop(key)` | 



Type | Create | Read | Update | Delete | 
:---: | :---:|:---:|:---:|:----:| 
`strings` | `"c1c2c3"` | `foo[idx]` |  **N/A**<br/><br/>_`foo.replace(v1, v2)`_ | **N/A**<br/><br/>_`foo.replace(val, "")`_
`list` | `[v1,.. vn]`  | `foo[idx]` | `foo[idx] = val` |`foo.remove(val)` or `foo.pop(idx)`
`tuple` | `(v1,.. vn)` | `foo[idx]` | **N/A**<br/><br/>_Convert to list_ | **N/A**<br/><br/>_Convert to list_ |  &cross; | value | &check;
`set` |  `{v1,.. vn}` | **N/A**<br/><br/>_`val in foo`_ | **N/A** | `foo.pop()` or `foo.remove()` | 
`dictionaries` | `{k1:v1,.. kn:vn}` | `foo[key]` | `foo[key] = val` | `foo.pop(key)` | 



Type | Create | Insert | Read | Update | Delete | 
:---: | :---:|:---:|:---:|:---:|:----:| 
`strings` | `"c1c2c3"` | `foo + "a"`| `foo[idx]` |  **N/A** | **N/A**
`list` | `[v1,.. vn]`  | `foo.append(val)` | `foo[idx]` | `foo[idx] = val` |`foo.remove(val)`
`tuple` | `(v1,.. vn)` | **N/A** | `foo[idx]` | **N/A** | **N/A** |  &cross; | value | &check;
`set` |  `{v1,.. vn}` | `foo.add(val)` | **N/A** | **N/A** | `foo.remove(val)` | 
`dictionaries` | `{k1:v1,.. kn:vn}` | `foo[key] = val` | `foo[key]` | `foo[key] = val` | `foo.pop(key)` | 

<hr/>

## `pandas`


* `pandas` is a powerful Python library used for **data science**.


* It has functions for **analyzing**, **cleaning**, **exploring**, and **manipulating data**.


* All modern **Machine Learning** and **Deep Learning** store and manipulate data in a manner similar to that in `pandas`. 
<br/>

* The following `import` convention is used for pandas:

In [None]:
import pandas as pd

* Thus, whenever you see `pd.` in code, it’s referring to `pandas`.

## Data Structures in `pandas`

* To get started with pandas, you will need to get comfortable with its two workhorse **data structures**: 

1. **`pd.Series`**


2. **`pd.DataFrame`**

* While they are not a universal solution for every problem, they provide a solid foundation for a wide variety of data tasks.

## `pd.Series`

* A Series is a one-dimensional array-like object containing a sequence of values 


* Of the same type and an associated array of data labels, called its index. 


* The simplest Series is formed from only an array of data:

In [None]:
obj = pd.Series([4, 7, -5, 3])

obj

* The Series printed above shows the **index on the left** and the **values on the right**. 


* Since **we did not specify an index** for the data, a **default index** consisting of the **integers 0 through N - 1** (where N is the length of the data) is created.

* You can get the index object of a Series via its `index` attribute:

In [None]:
obj.index

* You can get the array representation of the Series via its `values` attribute:

In [None]:
obj.values

* The result of these attributes often returns an object of data type internal to `pandas`. It is often a good idea to cast these to lists using `list(..)`

In [None]:
list(obj.index), list(obj.values)

* Often, you'll want to create a Series with an index identifying each data point with a label:

In [None]:
obj2 = pd.Series([4, 7, -5, 3], index=["d", "b", "a", "c"])

obj2

In [None]:
obj

## Filtering using Boolean Maps

`pd.Series` as a Vector

# The data science pipeline



1.   Get or collect data
2.   Manipulate and process data
3.   Modeling and analysis
4.   Visualize, evaluate, present, and communicate

