# Grundbegriffe 

Beantworte und diskutiere die folgenden Fragen konkret für den in Einheit 1 heruntergeladenen San Francisco Library Usage Datensatz. Notiere Deine Ergebnisse in Stichpunkte.

In [2]:
import pandas as pd
import numpy as np

In [3]:
df = pd.read_csv("/home/katja/2024-2025-Data_Lirbrarian_Katja_Gödde/Modul_3/data/Library_Usage.csv", dtype={"Within San Francisco County": str}, na_values=['Null', 'NA', ''], low_memory=False)  

In [4]:
df

Unnamed: 0,Patron Type Definition,Total Checkouts,Total Renewals,Age Range,Home Library Definition,Circulation Active Month,Circulation Active Year,Notice Preference Definition,Provided Email Address,Year Patron Registered,Within San Francisco County
0,Senior,5,0,75 years and over,Main,Nov,2022.0,Email,True,2015,False
1,Adult,0,0,45 to 54 years,Main,Jul,2023.0,Email,True,2019,False
2,Adult,0,0,55 to 59 years,Western Addition,Mar,2024.0,Email,True,2022,False
3,Welcome,1,1,20 to 24 years,Richmond,Aug,2022.0,Email,True,2022,False
4,Senior,0,0,65 to 74 years,Sunset,Mar,2024.0,Print,False,2023,False
...,...,...,...,...,...,...,...,...,...,...,...
450354,Digital Access Card,0,0,35 to 44 years,Ingleside,,,Email,False,2023,
450355,Digital Access Card,0,0,45 to 54 years,Ingleside,Aug,2022.0,Print,False,2022,
450356,Digital Access Card,0,0,25 to 34 years,Ingleside,Aug,2022.0,Print,False,2022,
450357,Digital Access Card,0,0,35 to 44 years,Ingleside,Apr,2022.0,Print,False,2022,


Wie viele Merkmale besitzt der Datensatz?
Der Datensatz besitzt 11 Merkmale, die hier durch die Spaltennamen repräsentiert werden.

Wie groß ist die Stichprobengröße des Datensatzes? 450359 (= die Anzahl der Zeilen)

Wer oder was sind die Merkmalsträger? Die Merkmalsträger sind die Bibliotheksnutzer*innen an denen die Ausprägungen gemessen werden.

Von wann bis wann wurden die Daten erhoben? Vom 02. Dezember 2016 bis zum 25. März 2024 (einsehbar in den Metadaten unter https://data.sfgov.org/Culture-and-Recreation/Library-Usage/qzz6-2jup/about_data)

Wie lässt sich die Grundgesamtheit beschreiben? Handelt es sich um eine Vollerhebung? Es handelt sich nicht um eine Vollerhebung, da nicht alle Kunden, sondern nur aktive Kunden mit einbezogen wurden. Aktive Kunden sind Kunden mit noch nicht abgelaufenen Bibliotheksausweisen und Kunden, die innerhalb der letzten drei Jahre eine Ausleihtätigkeit ausgeübt haben.

Welche Merkmale sind stetig? Welche diskret?
- Patron Type Definition: Diskret, da nur abzählbar viele Werte angenommen werden können.
- Total Checkouts: Diskret, da nur abzählbar viele Werte angenommen werden können.
- Total Reneweals: Diskret, da nur abzählbar viele Werte angenommen werden können.
- Age Range: Diskret, da es sich um abzählbar viele Gruppierungen von Altersklassen handelt. Wenn es keine Gruppierung gäbe, wäre das Merkmal dagegen stetig.
- Home Library Definition: Diskret, da nur abzählbar viele Werte angenommen werden können.
- Circulation Active Month: Diskret, da nur abzählbar viele Werte angenommen werden können.
- Circulation Active Year: Diskret, da nur abzählbar viele Werte angenommen werden können.
- Notice Preference Definition: Diskret, da nur abzählbar viele Werte angenommen werden können.
- Provided Email Address: Diskret, da nur abzählbar viele Werte angenommen werden können.
- Year Patron Registered: Diskret, da nur abzählbar viele Werte angenommen werden können.
- Within San Francisco County: Diskret, da nur abzählbar viele Werte angenommen werden können.

Welchem Skalenniveau entsprechen die einzelnen Merkmale (Nominal-, Ordinal- oder Metrische Skala)?
- Patron Type Definition: Nominalskala
- Total Checkouts: Metrische Skala
- Total Reneweals: Metrische Skala
- Age Range: Ordinalskala
- Home Library Definition: Nominalskala
- Circulation Active Month: Ordinalskala
- Circulation Active Year: Metrische Skala
- Notice Preference Definition: Nominalskala
- Provided Email Address: Nominalskala
- Year Patron Registered: Metrische Skala
- Within San Francisco County: Nominalskala

Enthält der Datensatz fehlende Werte? Ja, in den Spalten Age Range, Home Library Definition, Circulation Active Month, Circulation Active Year, Notice Preference Definition und Within San Francisco County

In [13]:
fehlende_werte = df.isnull().sum()
print(fehlende_werte)

Patron Type Definition              0
Total Checkouts                     0
Total Renewals                      0
Age Range                         510
Home Library Definition            30
Circulation Active Month        40317
Circulation Active Year         40317
Notice Preference Definition     3315
Provided Email Address              0
Year Patron Registered              0
Within San Francisco County       221
dtype: int64


Handelt es sich um Querschnitts-, Längsschnitss- oder Paneldaten? Es handelt sich um Paneldaten, da Beobachtungen von denselben Einheiten (Bibliotheksnutzer*innen) zu verschiedenen Zeitpunkten erhoben werden. 

# 2.5 Fallstudie: Feature Engineering

In [16]:
df.columns

Index(['Patron Type Definition', 'Total Checkouts', 'Total Renewals',
       'Age Range', 'Home Library Definition', 'Circulation Active Month',
       'Circulation Active Year', 'Notice Preference Definition',
       'Provided Email Address', 'Year Patron Registered',
       'Within San Francisco County'],
      dtype='object')

In [17]:
df['Total Checkouts']

0         5
1         0
2         0
3         1
4         0
         ..
450354    0
450355    0
450356    0
450357    0
450358    0
Name: Total Checkouts, Length: 450359, dtype: int64

In [18]:
df[['Total Checkouts', 'Total Renewals']]

Unnamed: 0,Total Checkouts,Total Renewals
0,5,0
1,0,0
2,0,0
3,1,1
4,0,0
...,...,...
450354,0,0
450355,0,0
450356,0,0
450357,0,0


In [19]:
df['is_adult'] = df['Patron Type Definition'] == 'Adult'
df['log_renewals'] = np.log(df['Total Renewals'] + 1)

In [20]:
df['is_adult']

0         False
1          True
2          True
3         False
4         False
          ...  
450354    False
450355    False
450356    False
450357    False
450358    False
Name: is_adult, Length: 450359, dtype: bool

In [21]:
df['log_renewals']

0         0.000000
1         0.000000
2         0.000000
3         0.693147
4         0.000000
            ...   
450354    0.000000
450355    0.000000
450356    0.000000
450357    0.000000
450358    0.000000
Name: log_renewals, Length: 450359, dtype: float64

In [22]:
df

Unnamed: 0,Patron Type Definition,Total Checkouts,Total Renewals,Age Range,Home Library Definition,Circulation Active Month,Circulation Active Year,Notice Preference Definition,Provided Email Address,Year Patron Registered,Within San Francisco County,is_adult,log_renewals
0,Senior,5,0,75 years and over,Main,Nov,2022.0,Email,True,2015,False,False,0.000000
1,Adult,0,0,45 to 54 years,Main,Jul,2023.0,Email,True,2019,False,True,0.000000
2,Adult,0,0,55 to 59 years,Western Addition,Mar,2024.0,Email,True,2022,False,True,0.000000
3,Welcome,1,1,20 to 24 years,Richmond,Aug,2022.0,Email,True,2022,False,False,0.693147
4,Senior,0,0,65 to 74 years,Sunset,Mar,2024.0,Print,False,2023,False,False,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...
450354,Digital Access Card,0,0,35 to 44 years,Ingleside,,,Email,False,2023,,False,0.000000
450355,Digital Access Card,0,0,45 to 54 years,Ingleside,Aug,2022.0,Print,False,2022,,False,0.000000
450356,Digital Access Card,0,0,25 to 34 years,Ingleside,Aug,2022.0,Print,False,2022,,False,0.000000
450357,Digital Access Card,0,0,35 to 44 years,Ingleside,Apr,2022.0,Print,False,2022,,False,0.000000


In [23]:
df['Membership Duration'] = ('Circulation Active Year' - 'Year Patron Registered')*12 + 'Circulation Active Month'

TypeError: unsupported operand type(s) for -: 'str' and 'str'

In [33]:
pd.to_numeric(
  df['Circulation Active Year'], errors='coerce'
)

0         2022.0
1         2023.0
2         2024.0
3         2022.0
4         2024.0
           ...  
450354       NaN
450355    2022.0
450356    2022.0
450357    2022.0
450358    2023.0
Name: Circulation Active Year, Length: 450359, dtype: float64

In [35]:
pd.to_datetime(
    df['Circulation Active Month'],
    errors='coerce',
    format="%b"
)


0        1900-11-01
1        1900-07-01
2        1900-03-01
3        1900-08-01
4        1900-03-01
            ...    
450354          NaT
450355   1900-08-01
450356   1900-08-01
450357   1900-04-01
450358   1900-04-01
Name: Circulation Active Month, Length: 450359, dtype: datetime64[ns]

In [37]:
df['Circulation Active Month'].dt.month

AttributeError: Can only use .dt accessor with datetimelike values

In [39]:
df['Circulation Active Month'].head() 

0    Nov
1    Jul
2    Mar
3    Aug
4    Mar
Name: Circulation Active Month, dtype: object

In [41]:
df['Circulation Active Month'].unique()

array(['Nov', 'Jul', 'Mar', 'Aug', 'Apr', 'Feb', 'Jan', nan, 'May', 'Oct',
       'Sep', 'Jun', 'Dec'], dtype=object)

In [43]:
df['Membership Duration'] = ('Circulation Active Year' - 'Year Patron Registered')*12 + 'Circulation Active Month'


TypeError: unsupported operand type(s) for -: 'str' and 'str'

In [45]:
df['Membership Duration'].fillna(0)

KeyError: 'Membership Duration'

# Aufgabe 2.6

Beispiele nachvollziehen

In [47]:
df.loc[df['Total Checkouts'] > 10000]

Unnamed: 0,Patron Type Definition,Total Checkouts,Total Renewals,Age Range,Home Library Definition,Circulation Active Month,Circulation Active Year,Notice Preference Definition,Provided Email Address,Year Patron Registered,Within San Francisco County,is_adult,log_renewals
12694,Senior,12233,5416,75 years and over,Merced,Mar,2024.0,Email,True,2003,False,False,8.597297
12925,Senior,10428,1079,75 years and over,Ortega,Mar,2024.0,Email,True,2003,False,False,6.984716
54102,Senior,17663,56,65 to 74 years,Marina,Mar,2024.0,Email,True,2008,False,False,4.043051
54414,Senior,12675,1422,65 to 74 years,Chinatown,Mar,2024.0,Email,True,2003,False,False,7.260523
59475,Senior,11252,2628,65 to 74 years,Main,Mar,2024.0,Phone,False,2013,True,False,7.874359
...,...,...,...,...,...,...,...,...,...,...,...,...,...
293994,Adult,39743,659,60 to 64 years,Ortega,Mar,2024.0,Email,True,2003,True,True,6.492240
294100,Adult,10454,8663,35 to 44 years,Ortega,Mar,2024.0,Email,True,2003,True,True,9.066932
306622,Adult,10065,3897,60 to 64 years,Golden Gate Valley,Mar,2024.0,Email,True,2003,True,True,8.268219
312053,Adult,14825,3915,25 to 34 years,Park,Mar,2024.0,Email,True,2003,True,True,8.272826


In [49]:
row_filter = df['Total Checkouts'] > 10000
df.loc[row_filter]

Unnamed: 0,Patron Type Definition,Total Checkouts,Total Renewals,Age Range,Home Library Definition,Circulation Active Month,Circulation Active Year,Notice Preference Definition,Provided Email Address,Year Patron Registered,Within San Francisco County,is_adult,log_renewals
12694,Senior,12233,5416,75 years and over,Merced,Mar,2024.0,Email,True,2003,False,False,8.597297
12925,Senior,10428,1079,75 years and over,Ortega,Mar,2024.0,Email,True,2003,False,False,6.984716
54102,Senior,17663,56,65 to 74 years,Marina,Mar,2024.0,Email,True,2008,False,False,4.043051
54414,Senior,12675,1422,65 to 74 years,Chinatown,Mar,2024.0,Email,True,2003,False,False,7.260523
59475,Senior,11252,2628,65 to 74 years,Main,Mar,2024.0,Phone,False,2013,True,False,7.874359
...,...,...,...,...,...,...,...,...,...,...,...,...,...
293994,Adult,39743,659,60 to 64 years,Ortega,Mar,2024.0,Email,True,2003,True,True,6.492240
294100,Adult,10454,8663,35 to 44 years,Ortega,Mar,2024.0,Email,True,2003,True,True,9.066932
306622,Adult,10065,3897,60 to 64 years,Golden Gate Valley,Mar,2024.0,Email,True,2003,True,True,8.268219
312053,Adult,14825,3915,25 to 34 years,Park,Mar,2024.0,Email,True,2003,True,True,8.272826


In [53]:
row_filter = (df['Patron Type Definition'] == 'Senior') & (df['Notice Preference Definition'] == 'Email')
df.loc[row_filter]
# Es werden alle Zeilen ausgegeben, in denen das Age Range der Gruppe der 65-74 jährigen oder der Gruppe der über 75-jährigen entspricht 
# und die Email als Preference ausgewählt haben.

Unnamed: 0,Patron Type Definition,Total Checkouts,Total Renewals,Age Range,Home Library Definition,Circulation Active Month,Circulation Active Year,Notice Preference Definition,Provided Email Address,Year Patron Registered,Within San Francisco County,is_adult,log_renewals
0,Senior,5,0,75 years and over,Main,Nov,2022.0,Email,True,2015,False,False,0.0
5,Senior,0,0,75 years and over,Main,Apr,2021.0,Email,True,2009,False,False,0.0
27,Senior,0,0,75 years and over,Main,Oct,2023.0,Email,True,2016,False,False,0.0
28,Senior,0,0,65 to 74 years,Main,Mar,2024.0,Email,True,2016,False,False,0.0
29,Senior,0,0,65 to 74 years,Main,Aug,2023.0,Email,True,2020,False,False,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
450101,Senior,0,0,75 years and over,Main,Mar,2024.0,Email,True,2013,False,False,0.0
450102,Senior,0,0,65 to 74 years,Noe Valley,Mar,2024.0,Email,True,2019,False,False,0.0
450142,Senior,0,0,65 to 74 years,Main,Nov,2022.0,Email,True,2021,,False,0.0
450143,Senior,0,0,65 to 74 years,Main,Mar,2024.0,Email,True,2019,,False,0.0


In [55]:
filter1 = (df['Total Checkouts'] >= 20) & (df['Total Checkouts'] <= 80)
filter2 =  df['Total Checkouts'].between(20, 80)
all(filter1 == filter2)

True

In [57]:
filter1

0         False
1         False
2         False
3         False
4         False
          ...  
450354    False
450355    False
450356    False
450357    False
450358    False
Name: Total Checkouts, Length: 450359, dtype: bool

In [59]:
filter2

0         False
1         False
2         False
3         False
4         False
          ...  
450354    False
450355    False
450356    False
450357    False
450358    False
Name: Total Checkouts, Length: 450359, dtype: bool

Bearbeiten der Aufgabe 2.6

Aufgabe 1: Filtere den Datensatz nach Kindern unter 10 Jahren. Wie viele Einträge erhältst Du? 

In [66]:
df

Unnamed: 0,Patron Type Definition,Total Checkouts,Total Renewals,Age Range,Home Library Definition,Circulation Active Month,Circulation Active Year,Notice Preference Definition,Provided Email Address,Year Patron Registered,Within San Francisco County,is_adult,log_renewals
0,Senior,5,0,75 years and over,Main,Nov,2022.0,Email,True,2015,False,False,0.000000
1,Adult,0,0,45 to 54 years,Main,Jul,2023.0,Email,True,2019,False,True,0.000000
2,Adult,0,0,55 to 59 years,Western Addition,Mar,2024.0,Email,True,2022,False,True,0.000000
3,Welcome,1,1,20 to 24 years,Richmond,Aug,2022.0,Email,True,2022,False,False,0.693147
4,Senior,0,0,65 to 74 years,Sunset,Mar,2024.0,Print,False,2023,False,False,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...
450354,Digital Access Card,0,0,35 to 44 years,Ingleside,,,Email,False,2023,,False,0.000000
450355,Digital Access Card,0,0,45 to 54 years,Ingleside,Aug,2022.0,Print,False,2022,,False,0.000000
450356,Digital Access Card,0,0,25 to 34 years,Ingleside,Aug,2022.0,Print,False,2022,,False,0.000000
450357,Digital Access Card,0,0,35 to 44 years,Ingleside,Apr,2022.0,Print,False,2022,,False,0.000000


In [70]:
df['Age Range'].unique()

array(['75 years and over', '45 to 54 years', '55 to 59 years',
       '20 to 24 years', '65 to 74 years', '10 to 19 years',
       '25 to 34 years', '60 to 64 years', '35 to 44 years',
       '0 to 9 years', nan], dtype=object)

In [72]:
df.loc[df['Age Range'] == '0 to 9 years']

Unnamed: 0,Patron Type Definition,Total Checkouts,Total Renewals,Age Range,Home Library Definition,Circulation Active Month,Circulation Active Year,Notice Preference Definition,Provided Email Address,Year Patron Registered,Within San Francisco County,is_adult,log_renewals
14,Teacher Card,0,0,0 to 9 years,Main,,,Email,True,2023,False,False,0.000000
1067,Adult,0,0,0 to 9 years,Richmond,Dec,2022.0,Phone,False,2022,False,True,0.000000
1487,Juvenile,0,0,0 to 9 years,Main,,,Email,True,2020,False,False,0.000000
1488,Juvenile,0,0,0 to 9 years,Main,,,Email,True,2020,False,False,0.000000
1491,Juvenile,0,0,0 to 9 years,Main,Jun,2023.0,Email,True,2017,False,False,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...
450241,Juvenile,0,0,0 to 9 years,Bernal Heights,,,Email,True,2022,,False,0.000000
450244,Juvenile,2,2,0 to 9 years,Bayview,Oct,2023.0,Email,False,2023,,False,1.098612
450245,Juvenile,0,0,0 to 9 years,Bayview,,,Email,False,2023,,False,0.000000
450246,Juvenile,2,2,0 to 9 years,Bayview,Oct,2023.0,Email,False,2023,,False,1.098612


Lösung: Es gibt 33180 Personen unter 10 Jahren.

Aufgabe 2: Gibt es Personen mit mehr als 20000 Ausleihen? 

In [81]:
df.loc[df['Total Checkouts'] > 20000]

Unnamed: 0,Patron Type Definition,Total Checkouts,Total Renewals,Age Range,Home Library Definition,Circulation Active Month,Circulation Active Year,Notice Preference Definition,Provided Email Address,Year Patron Registered,Within San Francisco County,is_adult,log_renewals
60282,Senior,20561,1021,65 to 74 years,Main,Mar,2024.0,Phone,False,2004,True,False,6.929517
60622,Senior,23134,617,65 to 74 years,Main,Mar,2024.0,Print,False,2003,True,False,6.426488
60994,Senior,21890,16,75 years and over,Main,Jan,2023.0,Email,True,2005,True,False,2.833213
179460,Adult,35176,62,25 to 34 years,Sunset,Mar,2024.0,Email,True,2010,True,True,4.143135
234329,Adult,23345,6458,55 to 59 years,Chinatown,Mar,2024.0,Email,True,2003,True,True,8.77323
234381,Adult,24386,387,45 to 54 years,Chinatown,Mar,2024.0,Phone,False,2003,True,True,5.961005
293994,Adult,39743,659,60 to 64 years,Ortega,Mar,2024.0,Email,True,2003,True,True,6.49224


Es gibt 7 Personen mit mehr als 20000 Ausleihen

Aufgabe 3: Wie viele Personen stammen aus dem Stadtteil (Richmond)?

In [88]:
df.loc[df['Home Library Definition'] == 'Richmond']

Unnamed: 0,Patron Type Definition,Total Checkouts,Total Renewals,Age Range,Home Library Definition,Circulation Active Month,Circulation Active Year,Notice Preference Definition,Provided Email Address,Year Patron Registered,Within San Francisco County,is_adult,log_renewals
3,Welcome,1,1,20 to 24 years,Richmond,Aug,2022.0,Email,True,2022,False,False,0.693147
228,Senior,0,0,65 to 74 years,Richmond,Feb,2024.0,Email,True,2016,False,False,0.000000
229,Senior,0,0,75 years and over,Richmond,Feb,2024.0,Email,True,2016,False,False,0.000000
230,Senior,7,19,65 to 74 years,Richmond,Mar,2024.0,Email,True,2023,False,False,2.995732
231,Senior,0,0,65 to 74 years,Richmond,Jan,2024.0,Email,True,2019,False,False,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...
450159,Adult,0,0,25 to 34 years,Richmond,,,Email,True,2023,,True,0.000000
450198,Juvenile,0,0,10 to 19 years,Richmond,Jul,2023.0,Email,False,2023,,False,0.000000
450255,Teen,1,0,20 to 24 years,Richmond,Feb,2024.0,Email,False,2023,,False,0.000000
450256,Teen,1288,109,10 to 19 years,Richmond,Aug,2023.0,Email,True,2009,,False,4.700480


21433 Personen stammen aus dem Stadtteil Richmond.

Aufgabe 4: Wie viele Prozent der Beobachtungen haben eine Membership Duration von Null Monaten?

Hier hab ich keine Lösung, weil die Bearbeitung der Aufgabe 2.5 in der die Membership Duration berechnet wurde bei mir nicht funktioniert hat.

# Aufgabe 2.9

In [93]:
df.head(100)

Unnamed: 0,Patron Type Definition,Total Checkouts,Total Renewals,Age Range,Home Library Definition,Circulation Active Month,Circulation Active Year,Notice Preference Definition,Provided Email Address,Year Patron Registered,Within San Francisco County,is_adult,log_renewals
0,Senior,5,0,75 years and over,Main,Nov,2022.0,Email,True,2015,False,False,0.000000
1,Adult,0,0,45 to 54 years,Main,Jul,2023.0,Email,True,2019,False,True,0.000000
2,Adult,0,0,55 to 59 years,Western Addition,Mar,2024.0,Email,True,2022,False,True,0.000000
3,Welcome,1,1,20 to 24 years,Richmond,Aug,2022.0,Email,True,2022,False,False,0.693147
4,Senior,0,0,65 to 74 years,Sunset,Mar,2024.0,Print,False,2023,False,False,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,Senior,81,37,75 years and over,Main,Jun,2023.0,Email,True,2003,False,False,3.637586
96,Senior,0,0,65 to 74 years,Main,Oct,2023.0,,False,2022,False,False,0.000000
97,Senior,0,0,65 to 74 years,Main,Mar,2024.0,Email,True,2019,False,False,0.000000
98,Senior,0,0,65 to 74 years,Main,Mar,2024.0,Email,True,2023,False,False,0.000000


In [95]:
df.describe(include='all')

Unnamed: 0,Patron Type Definition,Total Checkouts,Total Renewals,Age Range,Home Library Definition,Circulation Active Month,Circulation Active Year,Notice Preference Definition,Provided Email Address,Year Patron Registered,Within San Francisco County,is_adult,log_renewals
count,450359,450359.0,450359.0,449849,450329,410042,410042.0,447044,450359,450359.0,450138,450359,450359.0
unique,17,,,10,29,12,,3,2,,2,2,
top,Adult,,,25 to 34 years,Main,Mar,,Email,True,,True,True,
freq,284671,,,95278,145932,150957,,409071,408532,,377329,284671,
mean,,155.218053,78.968929,,,,2022.934855,,,2016.094902,,,1.988588
std,,526.242567,305.915233,,,,1.419927,,,6.278639,,,2.12158
min,,0.0,0.0,,,,2004.0,,,2003.0,,,0.0
25%,,0.0,0.0,,,,2022.0,,,2012.0,,,0.0
50%,,7.0,3.0,,,,2023.0,,,2018.0,,,1.386294
75%,,74.0,36.0,,,,2024.0,,,2021.0,,,3.610918


In [99]:
print(df['Total Renewals'].min())

0


In [103]:
df['Total Renewals'].sum()

35564368

In [105]:
df['Total Renewals'].between(100, 200).sum()

25835

In [109]:
df.columns

Index(['Patron Type Definition', 'Total Checkouts', 'Total Renewals',
       'Age Range', 'Home Library Definition', 'Circulation Active Month',
       'Circulation Active Year', 'Notice Preference Definition',
       'Provided Email Address', 'Year Patron Registered',
       'Within San Francisco County', 'is_adult', 'log_renewals'],
      dtype='object')

In [111]:
df.dtypes

Patron Type Definition           object
Total Checkouts                   int64
Total Renewals                    int64
Age Range                        object
Home Library Definition          object
Circulation Active Month         object
Circulation Active Year         float64
Notice Preference Definition     object
Provided Email Address             bool
Year Patron Registered            int64
Within San Francisco County      object
is_adult                           bool
log_renewals                    float64
dtype: object

In [113]:
df.shape

(450359, 13)