# Pandas Query Language
In diesem Notebook werden die Funktionalitäten der **Pandas Query Language** anhand des Datensatzes `netflix_titles.csv` demonstriert. Mit der Methode `query()` können wir komplexe Filterungen und Bedingungen auf Pandas-Datenrahmen anwenden.

In [1]:
# Bibliotheken importieren
import pandas as pd

# Netflix-Datensatz laden
df = pd.read_csv('netflix_titles.csv')

# Vorschau auf die ersten Zeilen des Datensatzes
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,TV Show,3%,,"João Miguel, Bianca Comparato, Michel Gomes, R...",Brazil,"August 14, 2020",2020,TV-MA,4 Seasons,"International TV Shows, TV Dramas, TV Sci-Fi &...",In a future where the elite inhabit an island ...
1,s2,Movie,7:19,Jorge Michel Grau,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,"December 23, 2016",2016,TV-MA,93 min,"Dramas, International Movies",After a devastating earthquake hits Mexico Cit...
2,s3,Movie,23:59,Gilbert Chan,"Tedd Chan, Stella Chung, Henley Hii, Lawrence ...",Singapore,"December 20, 2018",2011,R,78 min,"Horror Movies, International Movies","When an army recruit is found dead, his fellow..."
3,s4,Movie,9,Shane Acker,"Elijah Wood, John C. Reilly, Jennifer Connelly...",United States,"November 16, 2017",2009,PG-13,80 min,"Action & Adventure, Independent Movies, Sci-Fi...","In a postapocalyptic world, rag-doll robots hi..."
4,s5,Movie,21,Robert Luketic,"Jim Sturgess, Kevin Spacey, Kate Bosworth, Aar...",United States,"January 1, 2020",2008,PG-13,123 min,Dramas,A brilliant group of students become card-coun...


## 1. Grundlagen von `query()`
Die Methode `query()` erlaubt es, Daten auf Basis einer Abfrage zu filtern. Die Syntax ähnelt SQL, wobei Bedingungen als String übergeben werden.

In [4]:
# Filme filtern, die nach 2015 veröffentlicht wurden
filtered = df.query('release_year > 2015')
filtered.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,TV Show,3%,,"João Miguel, Bianca Comparato, Michel Gomes, R...",Brazil,"August 14, 2020",2020,TV-MA,4 Seasons,"International TV Shows, TV Dramas, TV Sci-Fi &...",In a future where the elite inhabit an island ...
1,s2,Movie,7:19,Jorge Michel Grau,"Demián Bichir, Héctor Bonilla, Oscar Serrano, ...",Mexico,"December 23, 2016",2016,TV-MA,93 min,"Dramas, International Movies",After a devastating earthquake hits Mexico Cit...
5,s6,TV Show,46,Serdar Akar,"Erdal Beşikçioğlu, Yasemin Allen, Melis Birkan...",Turkey,"July 1, 2017",2016,TV-MA,1 Season,"International TV Shows, TV Dramas, TV Mysteries",A genetics professor experiments with a treatm...
6,s7,Movie,122,Yasir Al Yasiri,"Amina Khalil, Ahmed Dawood, Tarek Lotfy, Ahmed...",Egypt,"June 1, 2020",2019,TV-MA,95 min,"Horror Movies, International Movies","After an awful accident, a couple admitted to ..."
8,s9,Movie,706,Shravan Kumar,"Divya Dutta, Atul Kulkarni, Mohan Agashe, Anup...",India,"April 1, 2019",2019,TV-14,118 min,"Horror Movies, International Movies","When a doctor goes missing, his psychiatrist w..."


### Vergleich von Spaltenwerten
Wir können Spaltenwerte direkt vergleichen. Beispiel: Zeige alle Filme, bei denen das Veröffentlichungsjahr vor dem Jahr des letzten Hinzufügens liegt.

In [5]:
# Vergleich von Spaltenwerten
df.query('release_year < date_added.str[:4]', engine='python').head()

ValueError: "slice" is not a supported function

## 2. Verwendung von logischen Operatoren
Logische Operatoren wie `and`, `or` und `not` können für komplexere Abfragen verwendet werden:

In [7]:
# Filme oder Serien, die nach 2010 veröffentlicht wurden und im Jahr 2020 hinzugefügt wurden
filtered = df.query('release_year > 2010 and date_added.str.contains("2020", na=False)')
filtered.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,TV Show,3%,,"João Miguel, Bianca Comparato, Michel Gomes, R...",Brazil,"August 14, 2020",2020,TV-MA,4 Seasons,"International TV Shows, TV Dramas, TV Sci-Fi &...",In a future where the elite inhabit an island ...
6,s7,Movie,122,Yasir Al Yasiri,"Amina Khalil, Ahmed Dawood, Tarek Lotfy, Ahmed...",Egypt,"June 1, 2020",2019,TV-MA,95 min,"Horror Movies, International Movies","After an awful accident, a couple admitted to ..."
14,s15,Movie,3022,John Suits,"Omar Epps, Kate Walsh, Miranda Cosgrove, Angus...",United States,"March 19, 2020",2019,R,91 min,"Independent Movies, Sci-Fi & Fantasy, Thrillers",Stranded when the Earth is suddenly destroyed ...
24,s25,TV Show,​SAINT SEIYA: Knights of the Zodiac,,"Bryson Baugus, Emily Neves, Blake Shepard, Pat...",Japan,"January 23, 2020",2020,TV-14,2 Seasons,"Anime Series, International TV Shows",Seiya and the Knights of the Zodiac rise again...
26,s27,TV Show,(Un)Well,,,United States,"August 12, 2020",2020,TV-MA,1 Season,Reality TV,This docuseries takes a deep dive into the luc...


## 3. String-Operationen
Strings können mit der `query()`-Methode und Python-Ausdrücken verarbeitet werden. Beispiel: Filtern Sie alle Titel, die das Wort 'Love' enthalten.

In [8]:
# Titel filtern, die das Wort 'Love' enthalten
filtered = df.query('title.str.contains("Love", case=False, na=False)')
filtered.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
198,s199,TV Show,A Little Thing Called First Love,,"Lai Kuan-lin, Zhao Jinmai, Wang Runze, Chai We...",China,"October 26, 2019",2019,TV-G,1 Season,"International TV Shows, Romantic TV Shows, Tee...",A shy college student with a knack for drawing...
199,s200,TV Show,A Love So Beautiful,,"Kim Yo-han, So Joo-yeon, Yeo Hoi-hyun, Jeong J...",South Korea,"December 28, 2020",2020,TV-PG,1 Season,"International TV Shows, Romantic TV Shows, TV ...",Love is as tough as it is sweet for a lovestru...
200,s201,Movie,A Love Song for Latasha,Sophia Nahli Allison,,United States,"September 21, 2020",2020,TV-PG,20 min,Documentaries,The killing of Latasha Harlins became a flashp...
201,s202,Movie,A Love Story,Maryo J. De los Reyes,"Maricel Soriano, Aga Muhlach, Angelica Pangani...",Philippines,"March 14, 2019",2007,TV-14,117 min,"Dramas, International Movies, Romantic Movies",Self-made millionaire Ian thinks he's found ha...
223,s224,Movie,A Secret Love,Chris Bolan,,United States,"April 29, 2020",2020,TV-14,83 min,"Documentaries, LGBTQ Movies","Amid shifting times, two women kept their deca..."


## 4. Kombination von Bedingungen
Bedingungen können mit Klammern gruppiert werden, um die Abfrage zu strukturieren. Beispiel: Filtern Sie alle Serien mit einer bestimmten Dauer und einem Veröffentlichungsjahr vor 2015.

In [17]:
# Serien filtern mit Dauer und Jahr
filtered = df.query('(type == "TV Show") and (release_year < 2015)')
filtered.head(3)

# Alternative
# filterer = df[(df["type"] == "TV Show") & (df["release_year"] < 2015)]
# filterer

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
45,s46,TV Show,Şubat,,"Alican Yücesoy, Melisa Sözen, Musa Uzunlar, Se...",Turkey,"January 17, 2017",2013,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Dramas",An orphan subjected to tests that gave him sup...
61,s62,TV Show,12 Years Promise,,"So-yeon Lee, Namkoong Min, Tae-im Lee, So-hui ...",South Korea,"May 22, 2017",2014,TV-14,1 Season,"International TV Shows, Korean TV Shows, Roman...",A pregnant teen is forced by her family to lea...
80,s81,TV Show,20 Minutes,,"Tuba Büyüküstün, Ilker Aksum, Bülent Emin Yara...",Turkey,"August 15, 2017",2013,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Dramas","When his wife is convicted of murder, a horrif..."


## 5. Wertebereiche (`in` und `not in`)
Die `query()`-Methode unterstützt auch Abfragen mit `in` und `not in`, um Werte in bestimmten Listen zu filtern.

In [21]:
# Filme und Serien aus bestimmten Ländern filtern
filtered = df.query('country in ["United States", "India"] and release_year > 2010')
filtered.head(2)

# alertantive
# filtered = df[(df['country'].isin(['United States', 'India'])) & (df['release_year'] > 2010)]
# filtered.head(2)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
8,s9,Movie,706,Shravan Kumar,"Divya Dutta, Atul Kulkarni, Mohan Agashe, Anup...",India,"April 1, 2019",2019,TV-14,118 min,"Horror Movies, International Movies","When a doctor goes missing, his psychiatrist w..."
10,s11,Movie,1922,Zak Hilditch,"Thomas Jane, Molly Parker, Dylan Schmid, Kaitl...",United States,"October 20, 2017",2017,TV-MA,103 min,"Dramas, Thrillers",A farmer pens a confession admitting to his wi...


## 6. Fehlende Werte filtern
Manchmal ist es hilfreich, nach fehlenden oder nicht fehlenden Werten zu filtern.

In [None]:
# Titel filtern, bei denen das Veröffentlichungsdatum fehlt
filtered = df.query('date_added.isnull()', engine='python')
filtered.head()

## 7. Sortierte Abfragen
Sortiere die gefilterten Daten nach einer bestimmten Spalte.

In [None]:
# Titel nach Veröffentlichungsjahr filtern und sortieren
filtered = df.query('release_year > 2000').sort_values(by='release_year', ascending=False)
filtered.head()

## 8. Weitere Beispiele:
### Beispiel 1: Serien aus Japan mit "Anime" im Titel
### Beispiel 2: Filme mit einer Dauer von mehr als 120 Minuten

In [None]:
# Beispiel 1: Serien aus Japan mit 'Anime' im Titel
anime_shows = df.query('(type == "TV Show") and (country == "Japan") and title.str.contains("Anime", case=False, na=False)', engine='python')
anime_shows.head()

In [None]:
# Beispiel 2: Filme mit einer Dauer von mehr als 120 Minuten
long_movies = df.query('(type == "Movie") and (duration.str.extract("(\\d+)").astype(float) > 120)', engine='python')
long_movies.head()

## Zusammenfassung
Die `query()`-Methode in Pandas ermöglicht eine einfache und lesbare Art der Filterung von Datenrahmen, besonders bei komplexen Bedingungen. Sie unterstützt logische Operatoren, String-Operationen und sogar den Umgang mit fehlenden Werten. Probieren Sie die gezeigten Beispiele mit Ihren eigenen Daten aus!