<a href="https://colab.research.google.com/github/francodem/effective_pandas_book_lessons/blob/main/effective_pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Effective Pandas
#### from Matt Harrison
### Learning from pandas - Lessons and more

### Library import

In [None]:
import pandas as pd

### Data structures

The most widely used data structures are the Series and the DataFrame for dealing with array data and tabular data.

A series is a 1D data structure, and a DataFrame is a 2D data structure.

In [None]:
# Making a DataFrame
df = pd.DataFrame(columns=['name','last_name','email'], index=[1,2,3])

In [None]:
# Set attribute on index 1
df['name'][1] = "Franco"
df['last_name'][1] = "Moreno"
df['email'][1] = "franco@hello.com"

In [None]:
# Read all the values from index 1
df.loc[1]

Unnamed: 0,1
name,Franco
last_name,Moreno
email,franco@hello.com


## Series Introduction

A Series is used to model one-dimensional data. The Series object also has a few more bits of data, including an index and a name. A common idea through pandas is the notion of an axis. Because a series is one-dimensional, it has a single axis—the index.

**Fundamentals bases:**

In [None]:
# Making a series DS
series = {
  'index': [0, 1, 2, 3],
  'data': [145, 142, 38, 13],
  'name': 'songs'
}

In [None]:
# A method to access to a series element
def get(series, idx):
  value_idx = series['index'].index(idx)
  return series['data'][value_idx]

In [None]:
# Getting the element
get(series=series, idx=1)

142

In [None]:
# Applying other abstraction with songs
songs = {
  'index': ['Paul', 'John', 'George', 'Ringo'],
  'data': [145, 142, 38, 13],
  'name': 'counts'
}

In [None]:
get(series=songs, idx='John')

142

**Now using a Pandas Series:**

In [None]:
# Making the Series
songs2 = pd.Series([145, 142, 38, 13], name='counts')

In [None]:
# Printing all details from songs2
songs2

Unnamed: 0,counts
0,145
1,142
2,38
3,13


**Fact:** looks like a 2D array, but the index is not considered in the data structure. We can ensure this by getting the shape.

In [None]:
songs2.shape

(4,)

In [None]:
# Makings songs3
songs3 = pd.Series([145, 142, 38, 13], name='counts', index=['Paul', 'John', 'George', 'Ringo'])

In [None]:
# Index inspection
songs3.index

Index(['Paul', 'John', 'George', 'Ringo'], dtype='object')

#### Storing a class into a Series - Heterogeneous or mixed types

In [None]:
# Storing a class into a Series
class Foo:
  pass

ringo = pd.Series(
  ['Richard', 'Starkey', 13, Foo()],
  name='ringo'
)

In [None]:
# Getting the ringo Series
ringo

Unnamed: 0,ringo
0,Richard
1,Starkey
2,13
3,<__main__.Foo object at 0x795ed1e90790>


### The NaN value

When Pandas determines that a Series holds a numeric values but cannot find a number to represent an entry, it will use **NaN**.

In [None]:
import numpy as np

# a Series
nan_series = pd.Series([2, np.nan])

In [None]:
# Getting the nan_series
nan_series

Unnamed: 0,0
0,2.0
1,


**Fact:** It's a float64 type, because it supports NaN, on the other hand, int64 does not support it.

**Fact 2:** Pandas count() ignores NaN values, so it will returns just the available values.

In [None]:
# Printing the count of nan_series
nan_series.count()

1

#### Optional Integer Support for NaN - P.17