# **DISEÑA EL ESQUEMA**

**Describe las entidades y campos clave del dataset (tipos de datos, llaves, nulabilidad).**

| Campo        | Tipo sugerido | Nullable | Descripción                  |
| ------------ | ------------- | -------- | ---------------------------- |
| show_id      | STRING        | NO       | ID único del título          |
| type         | STRING        | NO       | Movie / TV Show              |
| title        | STRING        | NO       | Nombre del contenido         |
| director     | STRING        | YES      | Director (puede estar vacío) |
| cast         | STRING        | YES      | Lista de actores             |
| country      | STRING        | YES      | País o países                |
| date_added   | DATE          | YES      | Fecha en que se añadió       |
| release_year | INT           | NO       | Año de lanzamiento           |
| rating       | STRING        | YES      | Clasificación (PG, R, etc.)  |
| duration     | STRING        | YES      | Duración o temporadas        |
| listed_in    | STRING        | YES      | Categorías/genres            |
| description  | STRING        | YES      | Sinopsis                     |


**Propón un DDL (Spark SQL) o una StructType (PySpark) que represente el esquema.**

CREATE TABLE IF NOT EXISTS jobs_catalogo.jobs_schema_netflix.movies (

    show_id STRING NOT NULL,

    type STRING NOT NULL,

    title STRING NOT NULL,

    director STRING,

    cast STRING,

    country STRING,

    date_added DATE,

    release_year INT NOT NULL,

    rating STRING,

    duration STRING,

    listed_in STRING,
    
    description STRING
)
USING DELTA;


**Incluye un diagrama simple (Mermaid/draw.io) o tabla de diccionario de datos en una celda Markdown.**

![](/Volumes/jobs_catalogo/jobs_schema_netflix/netflix1/METASTORE NETFLIX.drawio.png)


# **Configura y evidencia la infraestructura en Databricks CE**

Para levantar un cluster pagado en Databricks paso a paso, el proceso general incluye:

Iniciar sesión en Databricks en su plataforma (Azure Databricks, AWS Databricks, etc.).

Ir a la sección de Clusters y seleccionar crear un nuevo cluster.

Nombrar el cluster y seleccionar la configuración requerida, como tipo de máquina virtual, tamaño del cluster (número de nodos workers), y runtime (versión de Databricks Runtime).

Configurar parámetros avanzados si es necesario (Spark configuraciones, políticas de apagado automático, permisos de acceso).

Definir los nodos mínimos y máximos para autoescalado según carga de trabajo.

En caso de usar bibliotecas adicionales, agregar las necesarias para que se instalen al iniciar el cluster.

Crear el cluster y esperar a que se aprovisione y levante.

Una vez creado, se puede usar para ejecutar notebooks, jobs, y pipelines, y se puede detener o eliminar para optimizar costos.

El cluster pagado utiliza DBUs (Databricks Units) que dependen del tamaño y número de nodos; configurar el apagado automático ayuda a controlar costos para que no se mantenga activo sin uso.



# **Obtén datos de Kaggle y crea una tabla**

In [0]:
%sql
CREATE CATALOG IF NOT EXISTS jobs_catalogo
    


In [0]:
%sql
CREATE SCHEMA IF NOT EXISTS jobs_catalogo.jobs_schema_netflix

In [0]:
%sql
CREATE TABLE IF NOT EXISTS jobs_catalogo.jobs_schema_netflix.movies (
  show_id STRING,
  type STRING,
  title STRING,
  director STRING,
  cast STRING,
  country STRING,
  date_added STRING,
  release_year STRING,
  rating STRING,
  duration STRING,
  listed_in STRING,
  description STRING
) USING DELTA;


In [0]:
%sql
COPY INTO jobs_catalogo.jobs_schema_netflix.movies
FROM '/Volumes/jobs_catalogo/jobs_schema_netflix/netflix1/netflix_titles.csv'
FILEFORMAT = CSV
FORMAT_OPTIONS ( "header" = "true")
COPY_OPTIONS ("mergeSchema" = "true");
    


num_affected_rows,num_inserted_rows,num_skipped_corrupt_files
8809,8809,0


# **Validaciones en Spark y SQL**

In [0]:
%sql
DESCRIBE TABLE jobs_catalogo.jobs_schema_netflix.movies

col_name,data_type,comment
show_id,string,
type,string,
title,string,
director,string,
cast,string,
country,string,
date_added,string,
release_year,string,
rating,string,
duration,string,


In [0]:
%sql
SHOW CREATE TABLE jobs_catalogo.jobs_schema_netflix.movies


createtab_stmt
"CREATE TABLE jobs_catalogo.jobs_schema_netflix.movies (  show_id STRING,  type STRING,  title STRING,  director STRING,  cast STRING,  country STRING,  date_added STRING,  release_year STRING,  rating STRING,  duration STRING,  listed_in STRING,  description STRING) USING delta COLLATION 'UTF8_BINARY' TBLPROPERTIES (  'delta.enableDeletionVectors' = 'true',  'delta.enableRowTracking' = 'true',  'delta.feature.appendOnly' = 'supported',  'delta.feature.deletionVectors' = 'supported',  'delta.feature.domainMetadata' = 'supported',  'delta.feature.invariants' = 'supported',  'delta.feature.rowTracking' = 'supported',  'delta.minReaderVersion' = '3',  'delta.minWriterVersion' = '7')"


In [0]:
%sql
SELECT * FROM jobs_catalogo.jobs_schema_netflix.movies LIMIT 10;

show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmmaker Kirsten Johnson stages his death in inventive and comical ways to help them both face the inevitable."
s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thabang Molaba, Dillon Windvogel, Natasha Thahane, Arno Greeff, Xolile Tshabalala, Getmore Sithole, Cindy Mahlangu, Ryle De Morny, Greteli Fincham, Sello Maake Ka-Ncube, Odwa Gwanya, Mekaila Mathys, Sandi Schultz, Duane Williams, Shamilla Miller, Patrick Mofokeng",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town teen sets out to prove whether a private-school swimming star is her sister who was abducted at birth."
s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabiha Akkari, Sofia Lesaffre, Salim Kechiouche, Noureddine Farihi, Geert Van Rampelberg, Bakary Diombera",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Action & Adventure","To protect his family from a powerful drug lord, skilled thief Mehdi and his expert team of robbers are pulled into a violent and deadly turf war."
s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down among the incarcerated women at the Orleans Justice Center in New Orleans on this gritty reality series."
s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam Khan, Ahsaas Channa, Revathi Pillai, Urvi Singh, Arun Kumar",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV Comedies","In a city of coaching centers known to train India’s finest collegiate minds, an earnest but unexceptional student and his friends navigate campus life."
s6,TV Show,Midnight Mass,Mike Flanagan,"Kate Siegel, Zach Gilford, Hamish Linklater, Henry Thomas, Kristin Lehman, Samantha Sloyan, Igby Rigney, Rahul Kohli, Annarah Cymone, Annabeth Gish, Alex Essoe, Rahul Abburi, Matt Biedel, Michael Trucco, Crystal Balint, Louis Oliver",,"September 24, 2021",2021,TV-MA,1 Season,"TV Dramas, TV Horror, TV Mysteries","The arrival of a charismatic young priest brings glorious miracles, ominous mysteries and renewed religious fervor to a dying town desperate to believe."
s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, Sofia Carson, Liza Koshy, Ken Jeong, Elizabeth Perkins, Jane Krakowski, Michael McKean, Phil LaMarr",,"September 24, 2021",2021,PG,91 min,Children & Family Movies,"Equestria's divided. But a bright-eyed hero believes Earth Ponies, Pegasi and Unicorns should be pals — and, hoof to heart, she’s determined to prove it."
s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra Duah, Nick Medley, Mutabaruka, Afemo Omilami, Reggie Carter, Mzuri","United States, Ghana, Burkina Faso, United Kingdom, Germany, Ethiopia","September 24, 2021",1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model slips back in time, becomes enslaved on a plantation and bears witness to the agony of her ancestral past."
s9,TV Show,The Great British Baking Show,Andy Devonshire,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Hollywood",United Kingdom,"September 24, 2021",2021,TV-14,9 Seasons,"British TV Shows, Reality TV","A talented batch of amateur bakers face off in a 10-week competition, whipping up their best dishes in the hopes of being named the U.K.'s best."
s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, Timothy Olyphant, Daveed Diggs, Skyler Gisondo, Laura Harrier, Rosalind Chao, Kimberly Quinn, Loretta Devine, Ravi Kapoor",United States,"September 24, 2021",2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contends with a feisty bird that's taken over her garden — and a husband who's struggling to find a way forward.


In [0]:
%sql
SELECT 
  type,
  COUNT(*) AS total
FROM jobs_catalogo.jobs_schema_netflix.movies
GROUP BY type
ORDER BY total DESC;

type,total
Movie,6131
TV Show,2676
,1
William Wyler,1


In [0]:
%sql
SELECT COUNT(*) AS total_registros
FROM jobs_catalogo.jobs_schema_netflix.movies;

total_registros
8809


In [0]:
%sql
SELECT COUNT(*) AS total_colombia
FROM jobs_catalogo.jobs_schema_netflix.movies
WHERE country = 'Colombia';

total_colombia
35


In [0]:
%sql
SELECT COUNT(*) 
FROM jobs_catalogo.jobs_schema_netflix.movies
WHERE type = 'Movie';

COUNT(*)
6131


In [0]:
%sql
SELECT *
FROM jobs_catalogo.jobs_schema_netflix.movies
LIMIT 10;

show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmmaker Kirsten Johnson stages his death in inventive and comical ways to help them both face the inevitable."
s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thabang Molaba, Dillon Windvogel, Natasha Thahane, Arno Greeff, Xolile Tshabalala, Getmore Sithole, Cindy Mahlangu, Ryle De Morny, Greteli Fincham, Sello Maake Ka-Ncube, Odwa Gwanya, Mekaila Mathys, Sandi Schultz, Duane Williams, Shamilla Miller, Patrick Mofokeng",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town teen sets out to prove whether a private-school swimming star is her sister who was abducted at birth."
s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabiha Akkari, Sofia Lesaffre, Salim Kechiouche, Noureddine Farihi, Geert Van Rampelberg, Bakary Diombera",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Action & Adventure","To protect his family from a powerful drug lord, skilled thief Mehdi and his expert team of robbers are pulled into a violent and deadly turf war."
s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down among the incarcerated women at the Orleans Justice Center in New Orleans on this gritty reality series."
s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam Khan, Ahsaas Channa, Revathi Pillai, Urvi Singh, Arun Kumar",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV Comedies","In a city of coaching centers known to train India’s finest collegiate minds, an earnest but unexceptional student and his friends navigate campus life."
s6,TV Show,Midnight Mass,Mike Flanagan,"Kate Siegel, Zach Gilford, Hamish Linklater, Henry Thomas, Kristin Lehman, Samantha Sloyan, Igby Rigney, Rahul Kohli, Annarah Cymone, Annabeth Gish, Alex Essoe, Rahul Abburi, Matt Biedel, Michael Trucco, Crystal Balint, Louis Oliver",,"September 24, 2021",2021,TV-MA,1 Season,"TV Dramas, TV Horror, TV Mysteries","The arrival of a charismatic young priest brings glorious miracles, ominous mysteries and renewed religious fervor to a dying town desperate to believe."
s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, Sofia Carson, Liza Koshy, Ken Jeong, Elizabeth Perkins, Jane Krakowski, Michael McKean, Phil LaMarr",,"September 24, 2021",2021,PG,91 min,Children & Family Movies,"Equestria's divided. But a bright-eyed hero believes Earth Ponies, Pegasi and Unicorns should be pals — and, hoof to heart, she’s determined to prove it."
s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra Duah, Nick Medley, Mutabaruka, Afemo Omilami, Reggie Carter, Mzuri","United States, Ghana, Burkina Faso, United Kingdom, Germany, Ethiopia","September 24, 2021",1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model slips back in time, becomes enslaved on a plantation and bears witness to the agony of her ancestral past."
s9,TV Show,The Great British Baking Show,Andy Devonshire,"Mel Giedroyc, Sue Perkins, Mary Berry, Paul Hollywood",United Kingdom,"September 24, 2021",2021,TV-14,9 Seasons,"British TV Shows, Reality TV","A talented batch of amateur bakers face off in a 10-week competition, whipping up their best dishes in the hopes of being named the U.K.'s best."
s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, Timothy Olyphant, Daveed Diggs, Skyler Gisondo, Laura Harrier, Rosalind Chao, Kimberly Quinn, Loretta Devine, Ravi Kapoor",United States,"September 24, 2021",2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contends with a feisty bird that's taken over her garden — and a husband who's struggling to find a way forward.


In [0]:
%sql
SELECT type, COUNT(*) AS total
FROM jobs_catalogo.jobs_schema_netflix.movies
GROUP BY type;

type,total
Movie,6131
TV Show,2676
,1
William Wyler,1


# **Ventajas y desventajas: SQL vs Spark**

| Aspecto | SQL | Spark (PySpark) |
|--------|-----|------------------|
| Facilidad de uso | Muy simple y declarativo | Requiere aprendizaje mayor |
| Escalabilidad | Limitada según el motor | Procesamiento distribuido masivo |
| Transformaciones complejas | Limitadas | Muy flexibles, soporta UDFs |
| Integración con BI | Directa y nativa | Requiere exportar / Delta |
| Rendimiento | Excelente en datasets medianos | Superior para Big Data |
