# **Clean and Upload**

### 1. [Importación y conexión](#imp_con)

   - ##### 1.1 [Importación librerías](#importacion_lib)
   
   - ##### 1.2 [Conexión DB](#conexion)

### 2. [Limpieza CSV](#clean)

   - ##### 2.1 [Actor](#actor)
   
   - ##### 2.2 [Category](#category)
   
   - ##### 2.3 [Language](#language)
   
   - ##### 2.4 [Old HDD](#old_hdd)
   
   - ##### 2.5 [Film](#film)
   
   - ##### 2.4 [Inventory](#inventory)
      
   - ##### 2.7 [Rental](#rental)
 

## 1 - Importación y conexión <a name="imp_con"/>
***
***

### 1.1 - Importación de librerías <a name="importacion_lib"/>
***

In [1]:
import pandas as pd
import numpy as np
from sqlalchemy import create_engine
from password import *
from src.video_func import *
import warnings
warnings.filterwarnings('ignore')  
pd.set_option('display.max_columns', None) 


### 1.2 - Conexión DB <a name="conexion"/>
---

Tras examinar todos los archivos .CSV ubicados en la carpeta data. Se han evaluado las realaciones entre las distintas tablas, y se ha procedido a crear una base de datos (DB) con las siguientes relaciones siguiendo las siguientes pautas:
- Consideramos que una película puede tener uno o más idiomas (tanto original como el de la copia)
- Una película solo puede tener una categoría
- Una película puede tener muchas copias, pero una copia sólo puede estar en una tienda
- Un empleado sólo puede estar asociado a una tienda
- Un alquiler sólo puede ser creado por un empleado
- Crearemos una tabla de directores aunque esta esté vacía.


Muchos de las tablas que se han creado no tenemos datos para llenarlas. Pero partimos de la base que estamos creando una DB para un video club y que para el correcto desarrollo de su negocio, estas tablas serán cumplimentadas en un futuro. 

Con esto el esquema de la DB es el siguiente:

![schm](../images/SCHEMA.png)

Establecemos conexión con la DB

In [2]:
str_conn = f'mysql+pymysql://{usuario}:{password}@localhost:3306/Video_Club'
cursor = create_engine(str_conn)

## 2 - Limpieza CSV<a name="clean"/>
---
---

Procederemos a la limpieza de los distintos archivos CSV que tenemos, para luego proceder a incluir todos los datos que tengamos a la DB. Debido a las relaciones expuestas arriba dentro de la DB, tendremos que introducir primero los datos de aquellas tablas que no tengan Foreign Keys.

### 1 - Actor <a name="actor"/>
***

In [3]:
act = pd.read_csv('../data/raw/actor.csv')
act.head()

Unnamed: 0,actor_id,first_name,last_name,last_update
0,1,PENELOPE,GUINESS,2006-02-15 04:34:33
1,2,NICK,WAHLBERG,2006-02-15 04:34:33
2,3,ED,CHASE,2006-02-15 04:34:33
3,4,JENNIFER,DAVIS,2006-02-15 04:34:33
4,5,JOHNNY,LOLLOBRIGIDA,2006-02-15 04:34:33


In [4]:
# Tamaño de la tabla
act.shape

(200, 4)

In [5]:
# Revisamos duplicados en nombre y apellido
act[['first_name','last_name']].duplicated().sum()

1

In [6]:
# Eliminamos el registro duplicado, y corregimos índices
act_index = act[act[['first_name','last_name']].duplicated()].index
act = act.drop(index= act_index).reset_index(drop= True)
act['actor_id'] = range(1,act.shape[0]+1) # 

In [7]:
act.last_update.unique()  # Last_update no nos aporta información

array(['2006-02-15 04:34:33'], dtype=object)

In [8]:
act.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 199 entries, 0 to 198
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   actor_id     199 non-null    int64 
 1   first_name   199 non-null    object
 2   last_name    199 non-null    object
 3   last_update  199 non-null    object
dtypes: int64(1), object(3)
memory usage: 6.3+ KB


In [11]:
# cols_info() es una función del módulo video_func ubicado en la carpeta src
cols_info(act)

Unnamed: 0,Col Type,str,int,float==nan,unique,unique %
actor_id,int64,0,199,True,199,100.0
first_name,object,199,0,True,128,64.32
last_name,object,199,0,True,121,60.8
last_update,object,199,0,True,1,0.5


De la información anterior podemos hacer las siguientes transformaciones:
* <ins>**first_name / last_name</ins>:** Aplicamos método Capitalize para un correcto formato
* <ins>**last_update</ins>:** La eliminamos no aporta información de utilidad y sólo tiene un valor único
* Renombramos los nombres de columnas para que coincidan con la DB

In [12]:
act.first_name = act.first_name.str.capitalize()
act.last_name = act.last_name.str.capitalize()
act = act.drop(columns='last_update')
act.columns = ['Id_Actors','Name','Last_name']

In [13]:
act.head()

Unnamed: 0,Id_Actors,Name,Last_name
0,1,Penelope,Guiness
1,2,Nick,Wahlberg
2,3,Ed,Chase
3,4,Jennifer,Davis
4,5,Johnny,Lollobrigida


Procedemos a actualizar la DB con los datos de la tabla Actors, y a guardar el CSV ya limpio

In [14]:
act.to_csv('../data/clean/actor_clean.csv',index=False)
act.to_sql(name='Actors',
           con=cursor,
           if_exists='append',
           index=False);

### 2 - Category <a name="category"/> 
***

In [15]:
cat = pd.read_csv('../data/raw/category.csv')
cat.head(16)

Unnamed: 0,category_id,name,last_update
0,1,Action,2006-02-15 04:46:27
1,2,Animation,2006-02-15 04:46:27
2,3,Children,2006-02-15 04:46:27
3,4,Classics,2006-02-15 04:46:27
4,5,Comedy,2006-02-15 04:46:27
5,6,Documentary,2006-02-15 04:46:27
6,7,Drama,2006-02-15 04:46:27
7,8,Family,2006-02-15 04:46:27
8,9,Foreign,2006-02-15 04:46:27
9,10,Games,2006-02-15 04:46:27


In [16]:
cat.shape

(16, 3)

In [17]:
cat.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   category_id  16 non-null     int64 
 1   name         16 non-null     object
 2   last_update  16 non-null     object
dtypes: int64(1), object(2)
memory usage: 512.0+ bytes


In [18]:
cols_info(cat)

Unnamed: 0,Col Type,str,int,float==nan,unique,unique %
category_id,int64,0,16,True,16,100.0
name,object,16,0,True,16,100.0
last_update,object,16,0,True,1,6.25


In [19]:
cat.last_update.unique()

array(['2006-02-15 04:46:27'], dtype=object)

De la información anterior podemos hacer las siguientes transformaciones:
* <ins>**last_update</ins>:** La eliminamos no aporta información de utilidad y sólo tiene un valor único
* Creamos un nuevo registro para las categorías desconocidas con Id = 17 y valor 'Unknown'
* Renombramos los nombres de columnas para que coincidan con la DB

In [20]:
cat = cat.drop(columns='last_update')
cat.columns = ['Id_Category','Name']
cat = pd.concat([cat, pd.DataFrame({'Id_Category':[17],'Name':['Unknown']})]).reset_index(drop=True)
cat.tail()

Unnamed: 0,Id_Category,Name
12,13,New
13,14,Sci-Fi
14,15,Sports
15,16,Travel
16,17,Unknown


Procedemos a actualizar la DB con los datos de la tabla Category, y a guardar el CSV ya limpio

In [21]:
cat.to_csv('../data/clean/category_clean.csv',index=False)
cat.to_sql(name='Category',
           con=cursor,
           if_exists='append',
           index=False);

### 3 - Language <a name="language"/> 
***

In [22]:
lan = pd.read_csv('../data/raw/language.csv')
lan.head(10)

Unnamed: 0,language_id,name,last_update
0,1,English,2006-02-15 05:02:19
1,2,Italian,2006-02-15 05:02:19
2,3,Japanese,2006-02-15 05:02:19
3,4,Mandarin,2006-02-15 05:02:19
4,5,French,2006-02-15 05:02:19
5,6,German,2006-02-15 05:02:19


In [23]:
lan.shape

(6, 3)

In [24]:
lan.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   language_id  6 non-null      int64 
 1   name         6 non-null      object
 2   last_update  6 non-null      object
dtypes: int64(1), object(2)
memory usage: 272.0+ bytes


In [25]:
cols_info(lan)

Unnamed: 0,Col Type,str,int,float==nan,unique,unique %
language_id,int64,0,6,True,6,100.0
name,object,6,0,True,6,100.0
last_update,object,6,0,True,1,16.67


In [26]:
lan.last_update.unique()

array(['2006-02-15 05:02:19'], dtype=object)

De la información anterior podemos hacer las siguientes transformaciones:
* Añadiremos un registro nuevo con valor 'Unknown', para aquellas cuyo idioma se desconozca
* <ins>**last_update</ins>:** La eliminamos no aporta información de utilidad y sólo tiene un valor único
* Renombramos los nombres de columnas para que coincidan con la DB

In [27]:
lan = lan.drop(columns='last_update')
lan = pd.concat([lan, pd.DataFrame({'language_id':[7],'name':['Unknown']})]).reset_index(drop=True)
lan.columns = ['Id_Languages','Name']
lan.head(10)

Unnamed: 0,Id_Languages,Name
0,1,English
1,2,Italian
2,3,Japanese
3,4,Mandarin
4,5,French
5,6,German
6,7,Unknown


Procedemos a actualizar la DB con los datos de la tabla Languages, y a guardar el CSV ya limpio

In [28]:
lan.to_csv('../data/clean/language_clean.csv',index=False)
lan.to_sql(name='languages',
           con=cursor,
           if_exists='append',
           index=False);

### 4 - Old HDD <a name="old_hdd"/> 
***

Analizaremos ahora este CSV porque es el que nos da las relaciones entre algunas de las tablas. Información que necesitaremos para completar las siguientes tablas

In [29]:
ohdd = pd.read_csv('../data/raw/old_HDD.csv')
ohdd.head()

Unnamed: 0,first_name,last_name,title,release_year,category_id
0,PENELOPE,GUINESS,ACADEMY DINOSAUR,2006,6
1,PENELOPE,GUINESS,ANACONDA CONFESSIONS,2006,2
2,PENELOPE,GUINESS,ANGELS LIFE,2006,13
3,PENELOPE,GUINESS,BULWORTH COMMANDMENTS,2006,10
4,PENELOPE,GUINESS,CHEAPER CLYDE,2006,14


In [30]:
ohdd.shape

(1000, 5)

In [31]:
ohdd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   first_name    1000 non-null   object
 1   last_name     1000 non-null   object
 2   title         1000 non-null   object
 3   release_year  1000 non-null   int64 
 4   category_id   1000 non-null   int64 
dtypes: int64(2), object(3)
memory usage: 39.2+ KB


In [32]:
cols_info(ohdd)

Unnamed: 0,Col Type,str,int,float==nan,unique,unique %
first_name,object,1000,0,True,38,3.8
last_name,object,1000,0,True,37,3.7
title,object,1000,0,True,614,61.4
release_year,int64,0,1000,True,1,0.1
category_id,int64,0,1000,True,16,1.6


Solo disponemos información de 614 películas de las 1000 que componen el CSV de Films.
Esta información hay que tenerla en cuenta

In [33]:
# Solo disponemos de información de 39 actores para poder vincularlos con films

(ohdd.first_name + ' ' + ohdd.last_name).unique().size

39

De la información anterior podemos hacer las siguientes transformaciones:
* <ins>**first_name / last_name</ins>:** Aplicamos método Capitalize para un correcto formato
* <ins>**title</ins>:** Aplicamos método title para un correcto formato

In [34]:
ohdd.first_name = ohdd.first_name.str.capitalize()
ohdd.last_name = ohdd.last_name.str.capitalize()
ohdd.title = ohdd.title.str.title()
ohdd.head()

Unnamed: 0,first_name,last_name,title,release_year,category_id
0,Penelope,Guiness,Academy Dinosaur,2006,6
1,Penelope,Guiness,Anaconda Confessions,2006,2
2,Penelope,Guiness,Angels Life,2006,13
3,Penelope,Guiness,Bulworth Commandments,2006,10
4,Penelope,Guiness,Cheaper Clyde,2006,14


Procedemos a guardar únicamente el CSV

In [35]:
ohdd.to_csv('../data/clean/ohdd_clean.csv',index=False)

### 5 - Film <a name="film"/> 
***

In [36]:
film = pd.read_csv('../data/raw/film.csv')
film.head()

Unnamed: 0,film_id,title,description,release_year,language_id,original_language_id,rental_duration,rental_rate,length,replacement_cost,rating,special_features,last_update
0,1,ACADEMY DINOSAUR,A Epic Drama of a Feminist And a Mad Scientist...,2006,1,,6,0.99,86,20.99,PG,"Deleted Scenes,Behind the Scenes",2006-02-15 05:03:42
1,2,ACE GOLDFINGER,A Astounding Epistle of a Database Administrat...,2006,1,,3,4.99,48,12.99,G,"Trailers,Deleted Scenes",2006-02-15 05:03:42
2,3,ADAPTATION HOLES,A Astounding Reflection of a Lumberjack And a ...,2006,1,,7,2.99,50,18.99,NC-17,"Trailers,Deleted Scenes",2006-02-15 05:03:42
3,4,AFFAIR PREJUDICE,A Fanciful Documentary of a Frisbee And a Lumb...,2006,1,,5,2.99,117,26.99,G,"Commentaries,Behind the Scenes",2006-02-15 05:03:42
4,5,AFRICAN EGG,A Fast-Paced Documentary of a Pastry Chef And ...,2006,1,,6,2.99,130,22.99,G,Deleted Scenes,2006-02-15 05:03:42


In [37]:
film.tail()

Unnamed: 0,film_id,title,description,release_year,language_id,original_language_id,rental_duration,rental_rate,length,replacement_cost,rating,special_features,last_update
995,996,YOUNG LANGUAGE,A Unbelieveable Yarn of a Boat And a Database ...,2006,1,,6,0.99,183,9.99,G,"Trailers,Behind the Scenes",2006-02-15 05:03:42
996,997,YOUTH KICK,A Touching Drama of a Teacher And a Cat who mu...,2006,1,,4,0.99,179,14.99,NC-17,"Trailers,Behind the Scenes",2006-02-15 05:03:42
997,998,ZHIVAGO CORE,A Fateful Yarn of a Composer And a Man who mus...,2006,1,,6,0.99,105,10.99,NC-17,Deleted Scenes,2006-02-15 05:03:42
998,999,ZOOLANDER FICTION,A Fateful Reflection of a Waitress And a Boat ...,2006,1,,5,2.99,101,28.99,R,"Trailers,Deleted Scenes",2006-02-15 05:03:42
999,1000,ZORRO ARK,A Intrepid Panorama of a Mad Scientist And a B...,2006,1,,3,4.99,50,18.99,NC-17,"Trailers,Commentaries,Behind the Scenes",2006-02-15 05:03:42


In [38]:
film.shape

(1000, 13)

In [39]:
film.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   film_id               1000 non-null   int64  
 1   title                 1000 non-null   object 
 2   description           1000 non-null   object 
 3   release_year          1000 non-null   int64  
 4   language_id           1000 non-null   int64  
 5   original_language_id  0 non-null      float64
 6   rental_duration       1000 non-null   int64  
 7   rental_rate           1000 non-null   float64
 8   length                1000 non-null   int64  
 9   replacement_cost      1000 non-null   float64
 10  rating                1000 non-null   object 
 11  special_features      1000 non-null   object 
 12  last_update           1000 non-null   object 
dtypes: float64(3), int64(5), object(5)
memory usage: 101.7+ KB


In [40]:
cols_info(film)

Unnamed: 0,Col Type,Nulos,str,float,int,float==nan,unique,unique %
film_id,int64,0,0,0,1000,True,1000,100.0
title,object,0,1000,0,0,True,1000,100.0
description,object,0,1000,0,0,True,1000,100.0
release_year,int64,0,0,0,1000,True,1,0.1
language_id,int64,0,0,0,1000,True,1,0.1
original_language_id,float64,1000,0,1000,0,True,1,0.1
rental_duration,int64,0,0,0,1000,True,5,0.5
rental_rate,float64,0,0,1000,0,False,3,0.3
length,int64,0,0,0,1000,True,140,14.0
replacement_cost,float64,0,0,1000,0,False,21,2.1


De la información anterior podemos hacer las siguientes transformaciones:
* <ins>**title</ins>:** Aplicamos método title para un correcto formato
* <ins>**release_year</ins>:** Solo tiene un valor único de 2006, es improbable que todas las películas sean del 2006 (en la tabla OHDD también vimos lo mismo), pero no tenemos información que nos permita comprobarlo. Por lo que dejaremos el año 2006 y será una labor futura actualizar los valores de la DB en caso de ser necesario.
* <ins>**original_language_id</ins>:** Son todo nulos, los rellenaremos con 7 para que tome el valor 'Unknown' que hemos creado en la tabla Languages 
* <ins>**last_update</ins>:** La eliminamos no aporta información de utilidad y sólo tiene un valor único


In [41]:
film.title = film.title.str.title()
film.fillna(7,inplace=True)
film.original_language_id = film.original_language_id.astype(np.int8)
film = film.drop(columns='last_update')
film.head()

Unnamed: 0,film_id,title,description,release_year,language_id,original_language_id,rental_duration,rental_rate,length,replacement_cost,rating,special_features
0,1,Academy Dinosaur,A Epic Drama of a Feminist And a Mad Scientist...,2006,1,7,6,0.99,86,20.99,PG,"Deleted Scenes,Behind the Scenes"
1,2,Ace Goldfinger,A Astounding Epistle of a Database Administrat...,2006,1,7,3,4.99,48,12.99,G,"Trailers,Deleted Scenes"
2,3,Adaptation Holes,A Astounding Reflection of a Lumberjack And a ...,2006,1,7,7,2.99,50,18.99,NC-17,"Trailers,Deleted Scenes"
3,4,Affair Prejudice,A Fanciful Documentary of a Frisbee And a Lumb...,2006,1,7,5,2.99,117,26.99,G,"Commentaries,Behind the Scenes"
4,5,African Egg,A Fast-Paced Documentary of a Pastry Chef And ...,2006,1,7,6,2.99,130,22.99,G,Deleted Scenes


Para que la tabla films tenga la misma estructura que en la DB tenemos que hacer lo siguiente:
* Añadir la columna de Category_Id, y añadir la categoría correspondiente de cada película según tabla ohdd
* Eliminar las columnas de idioma. Crearemos el df film_lan que nos permitirá crear las tablas intermedias de idioma


In [42]:
# Hemos visto que sólo podemos sacar la categoría de 614 películas
# Vamos a ver cuales son

film_ohdd = pd.merge(left=film[['film_id','title']],
                     right=ohdd[['title','category_id']].drop_duplicates(),
                     how='inner',
                     left_on='title',
                     right_on='title')

In [43]:
film_ohdd.tail()

Unnamed: 0,film_id,title,category_id
609,993,Wrong Behavior,3
610,994,Wyoming Storm,13
611,996,Young Language,6
612,997,Youth Kick,12
613,998,Zhivago Core,11


In [44]:
film.insert(loc=4,column='Category_Id',value=17)  # Asignamos 17 porque es el valor que hemos decidido a desconocido

In [45]:
index = film.loc[film.film_id.isin(film_ohdd.film_id)].index
film.loc[index,'Category_Id'] = film_ohdd.category_id.to_list()

In [46]:
film.head()

Unnamed: 0,film_id,title,description,release_year,Category_Id,language_id,original_language_id,rental_duration,rental_rate,length,replacement_cost,rating,special_features
0,1,Academy Dinosaur,A Epic Drama of a Feminist And a Mad Scientist...,2006,6,1,7,6,0.99,86,20.99,PG,"Deleted Scenes,Behind the Scenes"
1,2,Ace Goldfinger,A Astounding Epistle of a Database Administrat...,2006,11,1,7,3,4.99,48,12.99,G,"Trailers,Deleted Scenes"
2,3,Adaptation Holes,A Astounding Reflection of a Lumberjack And a ...,2006,6,1,7,7,2.99,50,18.99,NC-17,"Trailers,Deleted Scenes"
3,4,Affair Prejudice,A Fanciful Documentary of a Frisbee And a Lumb...,2006,17,1,7,5,2.99,117,26.99,G,"Commentaries,Behind the Scenes"
4,5,African Egg,A Fast-Paced Documentary of a Pastry Chef And ...,2006,17,1,7,6,2.99,130,22.99,G,Deleted Scenes


In [47]:
film_lan = film[['film_id','language_id','original_language_id']]
film = film.drop(columns=['language_id','original_language_id'])

In [48]:
# Renombramos columnas para que coincidan con la DB
film.columns = ['Id_films','Title','Description','Release_year','Category_Id','Rent_duration','Rent_rate','Length','Replacement_cost','Rating','Special_Features']
film.head()

Unnamed: 0,Id_films,Title,Description,Release_year,Category_Id,Rent_duration,Rent_rate,Length,Replacement_cost,Rating,Special_Features
0,1,Academy Dinosaur,A Epic Drama of a Feminist And a Mad Scientist...,2006,6,6,0.99,86,20.99,PG,"Deleted Scenes,Behind the Scenes"
1,2,Ace Goldfinger,A Astounding Epistle of a Database Administrat...,2006,11,3,4.99,48,12.99,G,"Trailers,Deleted Scenes"
2,3,Adaptation Holes,A Astounding Reflection of a Lumberjack And a ...,2006,6,7,2.99,50,18.99,NC-17,"Trailers,Deleted Scenes"
3,4,Affair Prejudice,A Fanciful Documentary of a Frisbee And a Lumb...,2006,17,5,2.99,117,26.99,G,"Commentaries,Behind the Scenes"
4,5,African Egg,A Fast-Paced Documentary of a Pastry Chef And ...,2006,17,6,2.99,130,22.99,G,Deleted Scenes


In [49]:
film.to_csv('../data/clean/film_clean.csv',index=False)
film.to_sql(name='films',
           con=cursor,
           if_exists='append',
           index=False);

### 4 - Inventory <a name="inventory"/> 
***

In [50]:
inv = pd.read_csv('../data/raw/inventory.csv')
inv.head()

Unnamed: 0,inventory_id,film_id,store_id,last_update
0,1,1,1,2006-02-15 05:09:17
1,2,1,1,2006-02-15 05:09:17
2,3,1,1,2006-02-15 05:09:17
3,4,1,1,2006-02-15 05:09:17
4,5,1,2,2006-02-15 05:09:17


In [51]:
inv.shape

(1000, 4)

In [52]:
inv.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   inventory_id  1000 non-null   int64 
 1   film_id       1000 non-null   int64 
 2   store_id      1000 non-null   int64 
 3   last_update   1000 non-null   object
dtypes: int64(3), object(1)
memory usage: 31.4+ KB


In [53]:
cols_info(inv)

Unnamed: 0,Col Type,str,int,float==nan,unique,unique %
inventory_id,int64,0,1000,True,1000,100.0
film_id,int64,0,1000,True,207,20.7
store_id,int64,0,1000,True,2,0.2
last_update,object,1000,0,True,1,0.1


In [None]:
inv.last_update.unique()

In [None]:
inv.film_id.unique()

In [None]:
inv.film_id.unique().size

In [None]:
inv.store_id.unique()

### 7 - Rental <a name="rental"/> 
***

In [54]:
rent = pd.read_csv('../data/raw/rental.csv')
rent.head()

Unnamed: 0,rental_id,rental_date,inventory_id,customer_id,return_date,staff_id,last_update
0,1,2005-05-24 22:53:30,367,130,2005-05-26 22:04:30,1,2006-02-15 21:30:53
1,2,2005-05-24 22:54:33,1525,459,2005-05-28 19:40:33,1,2006-02-15 21:30:53
2,3,2005-05-24 23:03:39,1711,408,2005-06-01 22:12:39,1,2006-02-15 21:30:53
3,4,2005-05-24 23:04:41,2452,333,2005-06-03 01:43:41,2,2006-02-15 21:30:53
4,5,2005-05-24 23:05:21,2079,222,2005-06-02 04:33:21,1,2006-02-15 21:30:53


In [55]:
rent.shape

(1000, 7)

In [56]:
rent.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   rental_id     1000 non-null   int64 
 1   rental_date   1000 non-null   object
 2   inventory_id  1000 non-null   int64 
 3   customer_id   1000 non-null   int64 
 4   return_date   1000 non-null   object
 5   staff_id      1000 non-null   int64 
 6   last_update   1000 non-null   object
dtypes: int64(4), object(3)
memory usage: 54.8+ KB


In [57]:
cols_info(rent)

Unnamed: 0,Col Type,str,int,float==nan,unique,unique %
rental_id,int64,0,1000,True,1000,100.0
rental_date,object,1000,0,True,999,99.9
inventory_id,int64,0,1000,True,1000,100.0
customer_id,int64,0,1000,True,485,48.5
return_date,object,1000,0,True,997,99.7
staff_id,int64,0,1000,True,2,0.2
last_update,object,1000,0,True,1,0.1


In [None]:
rent.inventory_id.unique().size

In [None]:
rent.customer_id.unique().size

In [None]:
rent.rental_date.max()

In [None]:
rent.rental_date.min()