<a href="https://colab.research.google.com/github/jesusrevilla/mineria-de-datos/blob/main/primer-parcial/10_minutes_to_pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 10 minutes to pandas   
[Referencia](https://pandas.pydata.org/pandas-docs/version/1.3/user_guide/10min.html)

In [1]:
import numpy as np
import pandas as pd

Creación de Objetos

Crear una `Series` pasando una lista de valores, dejando que pandas cree el índice entero por defecto

In [3]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s

Unnamed: 0,0
0,1.0
1,3.0
2,5.0
3,
4,6.0
5,8.0


Creando un `DataFrame`pasando un arreglo de Numpy, con un índice `datetime` y nombres de columnas

In [7]:
dates = pd.date_range("20130101", periods=6)
dates

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [9]:
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
df

Unnamed: 0,A,B,C,D
2013-01-01,-0.555056,-0.705508,1.873881,0.768887
2013-01-02,1.880186,-1.227015,-0.073405,-0.256471
2013-01-03,0.208087,-0.138189,0.173231,-1.201815
2013-01-04,-0.521747,1.964468,-0.258599,1.417918
2013-01-05,-1.206785,-0.434811,-1.266266,-1.050509
2013-01-06,-0.426924,1.245972,-0.331731,-1.491287


Crear un `DataFrame`pasando un diccionario de objetos que puede ser convertido a una forma parecida de `series`.

In [12]:
df2 = pd.DataFrame(
    {
        "A": 1.0,
        "B": pd.Timestamp("20130102"),
        "C": pd.Series(1, index=list(range(4)), dtype="float32"),
        "D": np.array([3] * 4, dtype="int32"),
        "E": pd.Categorical(["test", "train", "test", "train"]),
        "F": "foo",
    }
)
df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2013-01-02,1.0,3,test,foo
1,1.0,2013-01-02,1.0,3,train,foo
2,1.0,2013-01-02,1.0,3,test,foo
3,1.0,2013-01-02,1.0,3,train,foo


Las columnas resultantes del `Dataframe` tienen distintos tipos

In [13]:
df2.dtypes

Unnamed: 0,0
A,float64
B,datetime64[s]
C,float32
D,int32
E,category
F,object


In [14]:
df2.A

Unnamed: 0,A
0,1.0
1,1.0
2,1.0
3,1.0


Viendo datos

In [15]:
df.head()

Unnamed: 0,A,B,C,D
2013-01-01,-0.555056,-0.705508,1.873881,0.768887
2013-01-02,1.880186,-1.227015,-0.073405,-0.256471
2013-01-03,0.208087,-0.138189,0.173231,-1.201815
2013-01-04,-0.521747,1.964468,-0.258599,1.417918
2013-01-05,-1.206785,-0.434811,-1.266266,-1.050509


In [16]:
df.tail()

Unnamed: 0,A,B,C,D
2013-01-02,1.880186,-1.227015,-0.073405,-0.256471
2013-01-03,0.208087,-0.138189,0.173231,-1.201815
2013-01-04,-0.521747,1.964468,-0.258599,1.417918
2013-01-05,-1.206785,-0.434811,-1.266266,-1.050509
2013-01-06,-0.426924,1.245972,-0.331731,-1.491287


In [17]:
df.index

DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')

In [18]:
df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

In [19]:
df.dtypes

Unnamed: 0,0
A,float64
B,float64
C,float64
D,float64


In [20]:
df.to_numpy()

array([[-0.55505617, -0.70550765,  1.87388071,  0.76888667],
       [ 1.88018631, -1.22701472, -0.07340526, -0.25647144],
       [ 0.20808693, -0.13818852,  0.17323121, -1.20181512],
       [-0.52174675,  1.9644678 , -0.25859908,  1.41791764],
       [-1.20678531, -0.43481079, -1.26626634, -1.05050937],
       [-0.42692428,  1.24597176, -0.33173129, -1.49128748]])

In [21]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,-0.103707,0.117486,0.019518,-0.302213
std,1.070773,1.228025,1.031978,1.173995
min,-1.206785,-1.227015,-1.266266,-1.491287
25%,-0.546729,-0.637833,-0.313448,-1.163989
50%,-0.474336,-0.2865,-0.166002,-0.65349
75%,0.049334,0.899932,0.111572,0.512547
max,1.880186,1.964468,1.873881,1.417918


In [22]:
df.T

Unnamed: 0,2013-01-01,2013-01-02,2013-01-03,2013-01-04,2013-01-05,2013-01-06
A,-0.555056,1.880186,0.208087,-0.521747,-1.206785,-0.426924
B,-0.705508,-1.227015,-0.138189,1.964468,-0.434811,1.245972
C,1.873881,-0.073405,0.173231,-0.258599,-1.266266,-0.331731
D,0.768887,-0.256471,-1.201815,1.417918,-1.050509,-1.491287


In [23]:
df.sort_index(axis=1, ascending=False)

Unnamed: 0,D,C,B,A
2013-01-01,0.768887,1.873881,-0.705508,-0.555056
2013-01-02,-0.256471,-0.073405,-1.227015,1.880186
2013-01-03,-1.201815,0.173231,-0.138189,0.208087
2013-01-04,1.417918,-0.258599,1.964468,-0.521747
2013-01-05,-1.050509,-1.266266,-0.434811,-1.206785
2013-01-06,-1.491287,-0.331731,1.245972,-0.426924


In [24]:
df.sort_values(by="B")

Unnamed: 0,A,B,C,D
2013-01-02,1.880186,-1.227015,-0.073405,-0.256471
2013-01-01,-0.555056,-0.705508,1.873881,0.768887
2013-01-05,-1.206785,-0.434811,-1.266266,-1.050509
2013-01-03,0.208087,-0.138189,0.173231,-1.201815
2013-01-06,-0.426924,1.245972,-0.331731,-1.491287
2013-01-04,-0.521747,1.964468,-0.258599,1.417918
