# Bases de datos en Python

## Índice
1. [Acceso a bases de datos relacionales](#sql)
2. [Crear bases de datos relacionales](#crear)
3. [Bases de datos NoSQL](#nosql)

<a id="sql"></a>
## Acceso a bases de datos relacionales

Podemos acceder a bases de datos SQL con la librería `pymysql`. Para instalarla escribimos en Anaconda Prompt:  
`conda install -c anaconda pymysql`

In [1]:
import pymysql

#### Ejemplo 1
Vamos a conectarnos a la base de datos NBA, que contiene estadísticas de partidos de una temporada.  
Utilizaremos los siguientes parámetros:  
* servidor: relational.fit.cvut.cz
* usuario: guest
* contraseña: relational  
* base de datos: NBA  

<img src="https://relational.fit.cvut.cz/assets/img/datasets-generated/NBA.svg" width="400">  

Vamos a crear una conexión con la base de datos

In [2]:
database_host = 'relational.fit.cvut.cz'
username = 'guest'
password = 'relational'
database_name = 'NBA'

db = pymysql.connect(database_host,username,password,database_name)
cursor = db.cursor()

La función `connect()` crea una conexión a la base de datos. Un *cursor* nos permite realizar operaciones con los datos almacenados en la base de datos. 

<img src='https://i.ibb.co/L8HH0G5/cursor.png'>  

Una vez creado el cursor, podemos empezar a ejecutar comandos sobre el contenido de la base de datos utilizando el métido `execute()`,

Al ejecutar queries, utilizamos los métodos `fetchone()` (primera fila) o `fectchall()` (todas las filas) para visizar los resultados de las consultas. Para cerrar la conexión, utilizamso el método `close()`

In [6]:
cursor.execute("SELECT * FROM Player")
cursor.fetchall()

((1, 'Nicolas Batum'),
 (2, 'LaMarcus Aldridge'),
 (3, 'Robin Lopez'),
 (4, 'Wesley Matthews'),
 (5, 'Damian Lillard'),
 (6, 'Thomas Robinson'),
 (7, 'Maurice Williams'),
 (8, 'Will Barton'),
 (9, 'Dorell Wright'),
 (10, 'Earl Watson'),
 (11, 'CJ McCollum'),
 (12, 'Meyers Leonard'),
 (13, 'Victor Claver'),
 (14, 'Kent Bazemore'),
 (15, 'Pau Gasol'),
 (16, 'Chris Kaman'),
 (17, 'Jodie Meeks'),
 (18, 'Kendall Marshall'),
 (19, 'Steve Nash'),
 (20, 'Xavier Henry'),
 (21, 'Robert Sacre'),
 (22, 'Ryan Kelly'),
 (23, 'Nick Young'),
 (24, 'Marshon Brooks'),
 (25, 'Jordan Hill'),
 (26, 'Wesley Johnson'),
 (27, 'Andre Iguodala'),
 (28, 'Draymond Green'),
 (29, "Jermaine O'Neal"),
 (30, 'Klay Thompson'),
 (31, 'Stephen Curry'),
 (32, 'Marreese Speights'),
 (33, 'Harrison Barnes'),
 (34, 'Steve Blake'),
 (35, 'Jordan Crawford'),
 (36, 'Hilton Armstrong'),
 (37, 'Andrew Bogut'),
 (38, 'David Lee'),
 (39, 'Shawn Marion'),
 (40, 'Dirk Nowitzki'),
 (41, 'Samuel Dalembert'),
 (42, 'Monta Ellis'),
 (43

Podemos ver las tablas de la base de datos con la query `SHOW TABLES`

In [4]:
cursor.execute("SHOW TABLES")
cursor.fetchall()

(('Actions',),
 ('Game',),
 ('Player',),
 ('Team',),
 ('joined_drafted_all_players_original',))

In [7]:
cursor.close()

El método `read_sql()` de pandas nos permite crear dataframes a partir de queries. Con este método no es necesario crear un cursor

In [8]:
import pandas as pd
query = "SELECT * FROM Actions"
df = pd.read_sql(query, db)
df

Unnamed: 0,GameId,TeamId,PlayerId,Minutes,FieldGoalsMade,FieldGoalAttempts,3PointsMade,3PointAttempts,FreeThrowsMade,FreeThrowAttempts,...,DefensiveRebounds,TotalRebounds,Assists,PersonalFouls,Steals,Turnovers,BlockedShots,BlocksAgainst,Points,Starter
0,1,7,78,2605,5,14,3,3,0,0,...,3,3,4,2,1,1,0,0,13,1
1,1,7,79,2359,11,19,3,3,8,8,...,8,8,3,2,3,3,1,0,34,1
2,1,7,80,2104,6,7,3,3,3,8,...,7,9,1,3,1,1,2,0,15,1
3,1,7,81,1392,1,5,3,3,0,0,...,2,2,0,4,1,0,0,0,2,1
4,1,7,82,2124,5,8,3,3,1,2,...,3,3,6,1,1,4,0,0,12,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
762,30,25,316,0,0,0,3,3,0,0,...,0,0,0,0,0,0,0,0,0,0
763,30,25,317,0,0,0,3,3,0,0,...,0,0,0,0,0,0,0,0,0,0
764,30,25,318,0,0,0,3,3,0,0,...,0,0,0,0,0,0,0,0,0,0
765,30,17,344,0,0,0,3,3,0,0,...,0,0,0,0,0,0,0,0,0,0


Obtenemos los jugadores con más de 5 asistencias en algún partido

In [14]:
query = '''
SELECT  Player.PlayerName, 
        Actions.Assists
FROM Actions
JOIN Player
ON Actions.PlayerId = Player.PlayerId
WHERE Assists > 5
'''

Asistencias5 = pd.read_sql(query, db)
Asistencias5

Unnamed: 0,PlayerName,Assists
0,Nicolas Batum,7
1,LaMarcus Aldridge,6
2,Wesley Matthews,6
3,Damian Lillard,8
4,Kent Bazemore,6
...,...,...
64,Kyrie Irving,8
65,Jameer Nelson,7
66,Tony Parker,8
67,Goran Dragic,8


Obtenemos el TOP 10 jugadores con más asistencias en la temporada

In [18]:
query = '''
SELECT PlayerName, SUM(Assists) AS TotalAssists

FROM Actions
JOIN Player
ON Actions.PlayerId = Player.PlayerId
GROUP BY PlayerName
ORDER BY TotalAssists DESC
LIMIT 10
'''

Asistencias = pd.read_sql(query, db)
Asistencias

Unnamed: 0,PlayerName,TotalAssists
0,Chris Paul,27.0
1,Kemba Walker,25.0
2,Brandon Jennings,22.0
3,Ty Lawson,21.0
4,Demar DeRozan,20.0
5,Ramon Sessions,17.0
6,Andre Miller,17.0
7,Lebron James,16.0
8,DJ Augustin,16.0
9,John Wall,16.0


Obtenemos el TOP 10 Equipos con más puntuación media por partido

In [24]:
query = '''
SELECT TeamName,AVG(TotalPoints) AS AvgPoints
FROM(
    SELECT TeamName, GameId, SUM(Points) as TotalPoints
    FROM Actions
    JOIN Team
    ON Actions.TeamId = Team.TeamId
    GROUP BY TeamName, GameId
    )TAB
GROUP BY TeamName
ORDER BY AvgPoints DESC
LIMIT 10
'''

Puntos = pd.read_sql(query,db)
Puntos

Unnamed: 0,TeamName,AvgPoints
0,Portland Trail Blazers,124.0
1,Cleveland Cavaliers,119.0
2,Dallas Mavericks,116.5
3,Golden State Warriors,112.5
4,Los Angeles Clippers,111.0
5,Charlotte Bobcats,108.3333
6,Phoenix Suns,108.0
7,Denver Nuggets,107.0
8,Los Angeles Lakers,107.0
9,Oklahoma City Thunder,106.0


#### Ejemplo 2
Vamos a conectarnos a la base de datos de los empleados de una empresa  
* servidor: relational.fit.cvut.cz
* usuario: guest
* contraseña: relational  
* base de datos: employees
<img src = 'https://relational.fit.cvut.cz/assets/img/datasets-generated/employee.svg'>

In [25]:
database_host = 'relational.fit.cvut.cz'
username = 'guest'
password = 'relational'
database_name = 'employee'

db = pymysql.connect(database_host,username,password,database_name)

In [27]:
query = "SHOW TABLES"
pd.read_sql(query,db)

Unnamed: 0,Tables_in_employee
0,departments
1,dept_emp
2,dept_manager
3,employees
4,salaries
5,titles


Obtener el salario máximo, mínimo y medio por género y cargo

In [28]:
query ='''
SELECT * FROM titles
LIMIT 10
'''
pd.read_sql(query,db)

Unnamed: 0,emp_no,title,from_date,to_date
0,10001,Senior Engineer,1986-06-26,9999-01-01
1,10002,Staff,1996-08-03,9999-01-01
2,10003,Senior Engineer,1995-12-03,9999-01-01
3,10004,Engineer,1986-12-01,1995-12-01
4,10004,Senior Engineer,1995-12-01,9999-01-01
5,10005,Senior Staff,1996-09-12,9999-01-01
6,10005,Staff,1989-09-12,1996-09-12
7,10006,Senior Engineer,1990-08-05,9999-01-01
8,10007,Senior Staff,1996-02-11,9999-01-01
9,10007,Staff,1989-02-10,1996-02-11


In [29]:
query ='''
SELECT * FROM salaries
LIMIT 10
'''
pd.read_sql(query,db)

Unnamed: 0,emp_no,salary,from_date,to_date
0,10001,60117,1986-06-26,1987-06-26
1,10001,62102,1987-06-26,1988-06-25
2,10001,66074,1988-06-25,1989-06-25
3,10001,66596,1989-06-25,1990-06-25
4,10001,66961,1990-06-25,1991-06-25
5,10001,71046,1991-06-25,1992-06-24
6,10001,74333,1992-06-24,1993-06-24
7,10001,75286,1993-06-24,1994-06-24
8,10001,75994,1994-06-24,1995-06-24
9,10001,76884,1995-06-24,1996-06-23


In [38]:
query ='''
SELECT * FROM employees
LIMIT 10
'''
pd.read_sql(query,db)

Unnamed: 0,emp_no,birth_date,first_name,last_name,gender,hire_date
0,10001,1953-09-02,Georgi,Facello,M,1986-06-26
1,10002,1964-06-02,Bezalel,Simmel,F,1985-11-21
2,10003,1959-12-03,Parto,Bamford,M,1986-08-28
3,10004,1954-05-01,Chirstian,Koblick,M,1986-12-01
4,10005,1955-01-21,Kyoichi,Maliniak,M,1989-09-12
5,10006,1953-04-20,Anneke,Preusig,F,1989-06-02
6,10007,1957-05-23,Tzvetan,Zielinski,F,1989-02-10
7,10008,1958-02-19,Saniya,Kalloufi,M,1994-09-15
8,10009,1952-04-19,Sumant,Peac,F,1985-02-18
9,10010,1963-06-01,Duangkaew,Piveteau,F,1989-08-24


In [42]:
query = '''
SELECT title, gender, max(salary), min(salary), avg(salary),count(salary)

FROM employees emp
JOIN salaries sal
ON emp.emp_no = sal.emp_no
JOIN titles titl
ON emp.emp_no = titl.emp_no
WHERE sal.to_date = '9999-01-01'
AND titl.to_date = '9999-01-01'
GROUP BY title, gender
'''

Salarios = pd.read_sql(query, db)

In [43]:
Salarios

Unnamed: 0,title,gender,max(salary),min(salary),avg(salary),count(salary)
0,Assistant Engineer,M,117636,39827,57197.9674,2148
1,Assistant Engineer,F,106340,39469,57495.9861,1440
2,Engineer,M,130939,38942,59592.9683,18571
3,Engineer,F,115444,39519,59617.3549,12412
4,Manager,M,106491,56654,79350.6,5
5,Manager,F,83457,65400,75690.0,4
6,Senior Engineer,M,140784,39285,70869.9085,51533
7,Senior Engineer,F,138273,39476,70753.8341,34406
8,Senior Staff,M,158220,39012,80735.4795,49232
9,Senior Staff,F,152710,39227,80662.9816,32792


In [46]:
pd.read_sql('''
SELECT title, count(title)
FROM titles
WHERE to_date = '9999-01-01'
GROUP BY title
''', db)

Unnamed: 0,title,count(title)
0,Assistant Engineer,3588
1,Engineer,30983
2,Manager,9
3,Senior Engineer,85939
4,Senior Staff,82024
5,Staff,25526
6,Technique Leader,12055


<a id="crear"></a>
## Crear bases de datos relacionales

Podemos crear nuestras propias bases de datos utilizando SQLite

In [47]:
import sqlite3

In [48]:
conn = sqlite3.connect('my_database.sqlite')
cursor = conn.cursor()

Podemos crear tablas con el comando `CREATE TABLE`

In [49]:
cursor.execute('''
CREATE TABLE SCHOOL 
(ID INT PRIMARY KEY NOT NULL,
 NAME TEXT NOT NULL,
 AGE INT NOT NULL,
 CITY CHAR(50),
 MARKS INT
)
''')

<sqlite3.Cursor at 0x20fd98315e0>

Para insertar valores usamos `INSERT INTO`. Siempre que hagamos un cambio en nuestra base de datos, tenemos que confirmarlo utilizando `commit()`

In [50]:
cursor.execute('''
INSERT INTO SCHOOL (ID, NAME, AGE, CITY, MARKS)
VALUES (1, 'Luis', 24, 'Madrid', 8)
''')

cursor.execute('''
INSERT INTO SCHOOL (ID, NAME, AGE, CITY, MARKS)
VALUES (2, 'Ana', 34, 'Bilbao', 9)
''')

conn.commit()

In [51]:
# Ejecutar queries
pd.read_sql('SELECT * FROM SCHOOL', conn)

Unnamed: 0,ID,NAME,AGE,CITY,MARKS
0,1,Luis,24,Madrid,8
1,2,Ana,34,Bilbao,9


También podemos crear tablas a partir de dataframes

In [52]:
Salarios

Unnamed: 0,title,gender,max(salary),min(salary),avg(salary),count(salary)
0,Assistant Engineer,M,117636,39827,57197.9674,2148
1,Assistant Engineer,F,106340,39469,57495.9861,1440
2,Engineer,M,130939,38942,59592.9683,18571
3,Engineer,F,115444,39519,59617.3549,12412
4,Manager,M,106491,56654,79350.6,5
5,Manager,F,83457,65400,75690.0,4
6,Senior Engineer,M,140784,39285,70869.9085,51533
7,Senior Engineer,F,138273,39476,70753.8341,34406
8,Senior Staff,M,158220,39012,80735.4795,49232
9,Senior Staff,F,152710,39227,80662.9816,32792


In [53]:
Salarios.to_sql('SALARIOS',conn, index=False)

In [55]:
# Listas tablas
pd.read_sql('SELECT name FROM sqlite_master WHERE type="table"',conn)

Unnamed: 0,name
0,SCHOOL
1,SALARIOS


In [56]:
pd.read_sql('SELECT * FROM SALARIOS',conn)

Unnamed: 0,title,gender,max(salary),min(salary),avg(salary),count(salary)
0,Assistant Engineer,M,117636,39827,57197.9674,2148
1,Assistant Engineer,F,106340,39469,57495.9861,1440
2,Engineer,M,130939,38942,59592.9683,18571
3,Engineer,F,115444,39519,59617.3549,12412
4,Manager,M,106491,56654,79350.6,5
5,Manager,F,83457,65400,75690.0,4
6,Senior Engineer,M,140784,39285,70869.9085,51533
7,Senior Engineer,F,138273,39476,70753.8341,34406
8,Senior Staff,M,158220,39012,80735.4795,49232
9,Senior Staff,F,152710,39227,80662.9816,32792


En ocasiones resulta útil guardar los resultados de nuestras queries y no tener que ejecutarlas cada vez que queremos acceder a una tabla. Para ello utilizamos la sentencia VIEW. Con VIEW creamos una vista de una tabla, es decir, obtenemos una tabla temporal que usaremos después.

In [60]:
cursor.execute('''
CREATE VIEW INGENIEROS2 AS
SELECT * FROM SALARIOS
WHERE title = 'Engineer'
''')

conn.commit()

In [61]:
pd.read_sql('SELECT * FROM INGENIEROS2',conn)

Unnamed: 0,title,gender,max(salary),min(salary),avg(salary),count(salary)
0,Engineer,M,130939,38942,59592.9683,18571
1,Engineer,F,115444,39519,59617.3549,12412


In [62]:
# Listas tablas
pd.read_sql('SELECT name FROM sqlite_master WHERE type="table"',conn)

Unnamed: 0,name
0,SCHOOL
1,SALARIOS


Para borrar tablas usamos el comando `DROP TABLE`

In [63]:
cursor.execute('DROP TABLE IF EXISTS SALARIOS')

<sqlite3.Cursor at 0x20fd98315e0>

In [64]:
# Listas tablas
pd.read_sql('SELECT name FROM sqlite_master WHERE type="table"',conn)

Unnamed: 0,name
0,SCHOOL


Para actualizar registros usamos el comando `UPDATE`

In [65]:
conn.execute('UPDATE SCHOOL SET MARKS=5 WHERE ID=2')

<sqlite3.Cursor at 0x20fd989a880>

In [66]:
pd.read_sql('SELECT * FROM SCHOOL',conn)

Unnamed: 0,ID,NAME,AGE,CITY,MARKS
0,1,Luis,24,Madrid,8
1,2,Ana,34,Bilbao,5


Si queremos borrar registros usamos `DELETE`

In [67]:
conn.execute('DELETE FROM SCHOOL WHERE ID=2')
conn.commit()

In [68]:
pd.read_sql('SELECT * FROM SCHOOL', conn)

Unnamed: 0,ID,NAME,AGE,CITY,MARKS
0,1,Luis,24,Madrid,8


Existen aplicaciones gratuitas, como [DB Browser for SQLite](https://sqlitebrowser.org/) que nos operar con bases de datos SQL desde una interfaz 

<a id="nosql"></a>
## Material Extra: Bases de datos NoSQL (MongoDB)

Las principales diferencias entre SQL y MongoDB son las siguientes: 
<img src='http://4.bp.blogspot.com/-edz2_QrFvCE/UnzBhKZE3FI/AAAAAAAAAEs/bTEsqnZFTXw/s1600/SQL-MongoDB+Correspondence.PNG'>

Vamos a conectarnos a una base de datos en MongoDB, para lo cual debemos instalar las siguientes librerías:  
`conda install -c anaconda pymongo`  
`conda install -c anaconda dnspython`

In [69]:
from pymongo import MongoClient
import dns

In [70]:
client = MongoClient("mongodb+srv://test:test@cluster0-czvtb.mongodb.net/admin?retryWrites=true&w=majority")

Nos conectamos a la base de datos [Sample Airbnb](https://docs.atlas.mongodb.com/sample-data/sample-airbnb/). Esta base de datos contiene una única colección llamada listingsAndReviews, que contiene documentos representando detalles de viviendas turísticas en airbnb.


In [71]:
db = client.get_database('sample_airbnb')

In [72]:
records = db.listingsAndReviews

In [73]:
# Contamos los documentos
records.count_documents({})

5555

Para hacer queries se utiliza el método `find()`

In [74]:
list(records.find())[0]

{'_id': '10009999',
 'listing_url': 'https://www.airbnb.com/rooms/10009999',
 'name': 'Horto flat with small garden',
 'summary': 'One bedroom + sofa-bed in quiet and bucolic neighbourhood right next to the Botanical Garden. Small garden, outside shower, well equipped kitchen and bathroom with shower and tub. Easy for transport with many restaurants and basic facilities in the area.',
 'space': 'Lovely one bedroom + sofa-bed in the living room, perfect for two but fits up to four comfortably.  There´s a small outside garden with a shower There´s a well equipped open kitchen with both 110V / 220V wall plugs and one bathroom with shower, tub and even a sauna machine! All newly refurbished!',
 'description': 'One bedroom + sofa-bed in quiet and bucolic neighbourhood right next to the Botanical Garden. Small garden, outside shower, well equipped kitchen and bathroom with shower and tub. Easy for transport with many restaurants and basic facilities in the area. Lovely one bedroom + sofa-bed

Filtramos viviendas con 2 baños y 3 dormitorios, con alguna review

In [75]:
list(records.find({'bathrooms':2,
                  'bedrooms':3,
                  'number_of_reviews':{'$ne':0}}).limit(3))

[{'_id': '10423504',
  'listing_url': 'https://www.airbnb.com/rooms/10423504',
  'name': 'Bondi Beach Dreaming 3-Bed House',
  'summary': "This peaceful house in North Bondi is 300m to the beach and a minute's walk to cafes and bars. With 3 bedrooms, (can sleep up to 8) it is perfect for families, friends and pets. The kitchen was recently renovated and a new lounge and chairs installed. The house has a peaceful, airy, laidback vibe  - a perfect beach retreat. Longer-term bookings encouraged. Parking for one car. A parking permit for a second car can also be obtained on request.",
  'space': "Serene space with three bedrooms, including a studio at the back, 300m to the beach and near to best cafes and bars in Bondi. Parking for one car. This wonderful house is designed to cater for families or groups with plenty of space and flexible bedding arrangements. There are three bedrooms including a master bedroom with a king bed, separate studio with a queen bed and a room with two single bed

Puedes encontrar más documentación sobre la librería `pymongo` en https://api.mongodb.com/python/current/