# Pandas. Introducción

## NumPy

NumPy es el paquete fundamental para la informática científica con Python. Entre otras cosas, contiene:

- Potente objeto de matriz N-dimensional
- Funciones sofisticadas (radiodifusión)
- Herramientas para integrar código C/C ++ y Fortran
- Álgebra lineal útil, transformada de Fourier y capacidades de números aleatorios

In [None]:
# Install NumPy using pip
!pip install --upgrade pip
!pip install numpy

In [None]:
# Import NumPy module
import numpy as np

### Inspección de la matriz

In [None]:
# Create array
a = np.arange(15).reshape(3, 5) # Create array with range 0-14 in 3 by 5 dimension
b = np.zeros((3,5)) # Create array with zeroes
c = np.ones( (2,3,4), dtype=np.int16 ) # Createarray with ones and defining data types
d = np.ones((3,5))

In [None]:
a.shape # Array dimension

In [None]:
len(b)# Length of array

In [None]:
c.ndim # Number of array dimensions

In [None]:
a.size # Number of array elements

In [None]:
b.dtype # Data type of array elements

In [None]:
c.dtype.name # Name of data type

In [None]:
c.astype(float) # Convert an array type to a different type

### Operaciones matemáticas básicas

In [None]:
# Create array
a = np.arange(15).reshape(3, 5) # Create array with range 0-14 in 3 by 5 dimension
b = np.zeros((3,5)) # Create array with zeroes
c = np.ones( (2,3,4), dtype=np.int16 ) # Createarray with ones and defining data types
d = np.ones((3,5))

In [None]:
np.add(a,b) # Addition

In [None]:
np.subtract(a,b) # Substraction

In [None]:
np.divide(a,d) # Division

In [None]:
np.multiply(a,d) # Multiplication

In [None]:
np.array_equal(a,b) # Comparison - arraywise

### Aggregate functions

In [None]:
# Create array
a = np.arange(15).reshape(3, 5) # Create array with range 0-14 in 3 by 5 dimension
b = np.zeros((3,5)) # Create array with zeroes
c = np.ones( (2,3,4), dtype=np.int16 ) # Createarray with ones and defining data types
d = np.ones((3,5))

In [None]:
a.sum() # Array-wise sum

In [None]:
a.min() # Array-wise min value

In [None]:
a.mean() # Array-wise mean

In [None]:
a.max(axis=0) # Max value of array row

In [None]:
np.std(a) # Standard deviation

### Subconfiguración, intervalos e indexación

In [None]:
# Create array
a = np.arange(15).reshape(3, 5) # Create array with range 0-14 in 3 by 5 dimension
b = np.zeros((3,5)) # Create array with zeroes
c = np.ones( (2,3,4), dtype=np.int16 ) # Createarray with ones and defining data types
d = np.ones((3,5))

In [None]:
a[1,2] # Select element of row 1 and column 2

In [None]:
a[0:2] # Select items on index 0 and 1

In [None]:
a[:1] # Select all items at row 0

In [None]:
a[-1:] # Select all items from last row

In [None]:
a[a<2] # Select elements from 'a' that are less than 2

### Manipulación de matriz

In [None]:
# Create array
a = np.arange(15).reshape(3, 5) # Create array with range 0-14 in 3 by 5 dimension
b = np.zeros((3,5)) # Create array with zeroes
c = np.ones( (2,3,4), dtype=np.int16 ) # Createarray with ones and defining data types
d = np.ones((3,5))

In [None]:
np.transpose(a) # Transpose array 'a'

In [None]:
a.ravel() # Flatten the array

In [None]:
a.reshape(5,-2) # Reshape but don't change the data

In [None]:
np.append(a,b) # Append items to the array

In [None]:
np.concatenate((a,d), axis=0) # Concatenate arrays

In [None]:
np.vsplit(a,3) # Split array vertically at 3rd index

In [None]:
np.hsplit(a,5) # Split array horizontally at 5th index

## Pandas

Pandas es una biblioteca de código abierto con licencia BSD que proporciona estructuras de datos y herramientas de análisis de datos fáciles de usar y de alto rendimiento para el lenguaje de programación Python.

Los DataFrames de Pandas son la representación en memoria más utilizada de colecciones de datos complejas dentro de Python.

In [None]:
# Install pandas using pip
!pip install pandas


In [None]:
# Import NumPy and Pandas modules
import numpy as np
import pandas as pd

In [None]:
# Sample dataframe df
df = pd.DataFrame({'num_legs': [2, 4, np.nan, 0],
                   'num_wings': [2, 0, 0, 0],
                   'num_specimen_seen': [10, np.nan, 1, 8]},
                   index=['falcon', 'dog', 'spider', 'fish'])
df # Display dataframe df

In [None]:
# Another sample dataframe df1 - using NumPy array with datetime index and labeled column
df1 = pd.date_range('20230901', periods=6)
df1 = pd.DataFrame(np.random.randn(6, 4), index=df1, columns=list('ABCD'))
df1 # Display dataframe df1

### Visualización de los datos

In [None]:
df1.head(2) # View top data

In [None]:
df1.tail(2) # View bottom data

In [None]:
df1.index # Display index column

In [None]:
df1.dtypes # Inspect datatypes

In [None]:
df1.describe() # Display quick statistics summary of data

### Subconfiguración, intervalos e indexación

In [None]:
# Recuerda
df1

In [None]:
df1.T # Transpose data

In [None]:
df1.sort_index(axis=1, ascending=False) # Sort by an axis

In [None]:
df1.sort_values(by='B') # Sort by values

In [None]:
df1['A'] # Select column A

In [None]:
df1[0:3] # Select index 0 to 2

In [None]:
df1['20130102':'20130104'] # Select from index matching the values

In [None]:
df1.loc[:, ['A', 'B']] # Select on a multi-axis by label

In [None]:
df1.iloc[3] # Select via the position of the passed integers

In [None]:
df1[df1 > 0] # Select values from a DataFrame where a boolean condition is met

In [None]:
df2 = df1.copy() # Copy the df1 dataset to df2
df2['E'] = ['one', 'one', 'two', 'three', 'four', 'three'] # Add column E with value
df2[df2['E'].isin(['two', 'four'])] # Use isin method for filtering

### Pérdida de datos

Pandas utiliza principalmente el valor `np.nan` para representar los datos faltantes. No está incluido en los cálculos de forma predeterminada.

In [None]:
df = pd.DataFrame({'num_legs': [2, 4, np.nan, 0],
                   'num_wings': [2, 0, 0, 0],
                   'num_specimen_seen': [10, np.nan, 1, 8]},
                   index=['falcon', 'dog', 'spider', 'fish'])

In [None]:
df.dropna(how='any') # Drop any rows that have missing data

In [None]:
df.dropna(how='any', axis=1) # Drop any columns that have missing data

In [None]:
df.fillna(value=5) # Fill missing data with value 5

In [None]:
pd.isna(df) # To get boolean mask where data is missing

### Administración de archivos

In [None]:
df = pd.DataFrame({'num_legs': [2, 4, np.nan, 0],
                   'num_wings': [2, 0, 0, 0],
                   'num_specimen_seen': [10, np.nan, 1, 8]},
                   index=['falcon', 'dog', 'spider', 'fish'])

In [None]:
df.to_csv('animales.csv') # Write to CSV file

In [None]:
pd.read_csv('animales.csv') # Read from CSV file

In [None]:
df.to_excel('animales.xlsx', sheet_name='Pandas') # Write to Microsoft Excel file

In [None]:
pd.read_excel('animales.xlsx', 'Pandas', index_col=None, na_values=['NA']) # Read from Microsoft Excel file

### Gráficos

In [None]:
# Install Matplotlib using pip
!pip install matplotlib

In [None]:
from matplotlib import pyplot as plt # Import Matplotlib module

In [None]:
# Generate random time-series data
ts = pd.Series(np.random.randn(1000),index=pd.date_range('1/1/2022', periods=1000))
ts.head()

In [None]:
ts = ts.cumsum()
ts.plot() # Plot graph
plt.show()

In [None]:
# On a DataFrame, the plot() method is convenient to plot all of the columns with labels
df4 = pd.DataFrame(np.random.randn(1000, 4), index=ts.index,columns=['A', 'B', 'C', 'D'])
df4 = df4.cumsum()
df4.head()

In [None]:
df4.plot()
plt.show()

### Gráficos con plotly

In [None]:
pip install "jupyterlab>=3" "ipywidgets>=7.6"
pip install jupyter-dash
pip install plotly

In [None]:
import plotly.express as px

In [None]:
fig = px.bar(df4)
fig.show()


In [None]:
# Un mapa con plotly
# Si usas mapbox, algunas tiles necesitan TOKEN
import pandas as pd
us_cities = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/us-cities-top-1k.csv")

import plotly.express as px

fig = px.scatter_mapbox(us_cities, lat="lat", lon="lon", hover_name="City", hover_data=["State", "Population"],
                        color_discrete_sequence=["fuchsia"], zoom=3, height=300)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()