# Pandas 

* **Pandas** è una libreria Python utilizzata per lavorare con i set di dati.
* Ci permette di analizzare, pulire, esplorare e manipolare i dati.
* Pandas utilizza principalmente due strutture dati per memorizzare i dati:
    * **Series**: Un array unidimensionale con etichette dei dati, chiamate indice, capace di contenere qualsiasi tipo di dato. È simile a una colonna in un foglio di calcolo.
    * **DataFrame**: Una tabella bidimensionale, mutabile, con assi etichettati (righe e colonne). Assomiglia a un foglio di calcolo o a una tabella SQL e può contenere più oggetti Series di tipi di dati diversi.

### Importanza

* Pandas è una delle librerie più popolari e più utilizzate per lavorare con i dati.
* Fornisce funzioni per la manipolazione dei dati, dalla semplice aggregazione all'unione e fusione complessa dei set di dati.
* Ci permette di analizzare dati di grande dimensione e utilizzare la statistica.
* Funziona bene con altre librerie Python, migliorando la sua funzionalità per i calcoli numerici e le visualizzazioni.

## Tabular Data

I **dati tabellari** sono qualsiasi insieme di dati che può essere organizzato in **righe e colonne**, essenzialmente una **matrice bidimensionale**.

A differenza dei dati strutturati in tabelle, esistono altri tipi di dati **non tabellari**

Ognuno di questi richiede **metodi e formati specifici** per essere rappresentato ed elaborato,  
mentre i dati tabellari sono tra i più semplici da gestire e analizzare.

![image.png](attachment:image.png)

## Tipi di Dati Misti

Spesso i nostri dati sono composti da **tipi misti**, come **interi**, **numeri decimali**, e **stringhe**.  
Questo succede frequentemente. Ad esempio, se stai raccogliendo informazioni mediche di base da un paziente:

- **Altezza e peso** → numerici (float)
- **Età** → numero intero (int)
- **Gruppo sanguigno** → categoria (stringa)

---

### Problema con NumPy

Anche se **NumPy** può tecnicamente memorizzare questi dati misti in un unico array, **non è pensato per farlo** in modo efficiente.

In [4]:
import numpy as np

a = np.array([6.1,150.0,25,'A-'])
b = np.array([5.6,122.0,29,'B+'])
a + b


array(['6.15.6', '150.0122.0', '2529', 'A-B+'], dtype='<U64')

In [None]:
%conda install pandas

In [5]:
import pandas as pd

## Series

Una `Series` è una delle strutture dati fondamentali in pandas, simile a un array unidimensionale (come in NumPy), MA con capacità di indicizzazione potenti e flessibili.

#### Caratteristiche Principali
- **Unidimensionale**: è essenzialmente una singola colonna di dati.
- **Indicizzata**: ogni elemento in una Series ha un'etichetta associata, chiamata *indice*. L’indice può essere numerico, una data o anche una stringa.
- **Diversi tipi di dati**: una Series può contenere qualsiasi tipo di dato—interi, stringhe, float, oggetti Python, ecc. Tuttavia, tutti gli elementi all’interno di una Series devono essere dello stesso tipo di dato.


In [6]:
# Creating a Series from a list
data = [10, 20, 30, 40, 50]
series = pd.Series(data, index=['a', 'b', 'c', 'd', 'e'])
series

a    10
b    20
c    30
d    40
e    50
dtype: int64

In [7]:
series.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

## Relazione con gli Array NumPy

Come avrai notato, ci sono delle somiglianze tra la costruzione di una **Series di Pandas** e un **array di NumPy**.  
La **differenza principale** è la presenza dell’**indice** in Pandas. Con Pandas possiamo assegnare etichette personalizzate agli indici ('mon', 'tue', 'wed'), il che rende i dati più leggibili e più facili da manipolare rispetto agli indici numerici standard.


Una volta che abbiamo creato la nostra Series, possiamo recuperare i dati come array NumPy:

In [8]:
series.values

array([10, 20, 30, 40, 50])

## DataFrame

- Un DataFrame è una struttura dati bidimensionale etichettata con colonne di tipi potenzialmente diversi.
- È simile a una tabella in un database relazionale o a un foglio di calcolo.

#### Righe e Colonne

- I DataFrame sono composti da righe e colonne, dove ogni riga rappresenta una singola osservazione o record, e ogni colonna rappresenta una variabile o caratteristica.


In [9]:
import pandas as pd

data = {
    'job_title': ['Data Analyst', 'Data Scientist', 'Data Engineer'],
    'location': ['Italy', 'USA', 'Germany'],
    'salary': [40000, 80000, 70000],
    'date_posted': ['2024-12-01', '2024-11-15', '2024-12-10']  # formato YYYY-MM-DD
}

df = pd.DataFrame(data)
df


Unnamed: 0,job_title,location,salary,date_posted
0,Data Analyst,Italy,40000,2024-12-01
1,Data Scientist,USA,80000,2024-11-15
2,Data Engineer,Germany,70000,2024-12-10


#### Nomi delle Colonne

- I nomi delle colonne forniscono etichette per ogni colonna nel DataFrame.
- Permettono un facile riferimento e manipolazione dei dati.


In [10]:
df['job_title']

0      Data Analyst
1    Data Scientist
2     Data Engineer
Name: job_title, dtype: object

In [11]:
type(df['job_title'])

pandas.core.series.Series

**Nota:** Puoi ottenere informazioni sui nomi delle colonne, la loro lunghezza e il tipo di dato (dtype).

Puoi anche accedere a una colonna utilizzando la **dot notation**.

In [12]:
df.job_title

0      Data Analyst
1    Data Scientist
2     Data Engineer
Name: job_title, dtype: object

In [13]:
df.columns

Index(['job_title', 'location', 'salary', 'date_posted'], dtype='object')

#### Indice

- I DataFrame hanno un indice, che fornisce un'etichetta per ogni riga.  
- Per impostazione predefinita, è una sequenza di numeri interi a partire da 0, ma può essere personalizzato.

In [14]:
# Access a row by index
df.job_title[2]

'Data Engineer'

## Caricamento dei Dati

- Possiamo caricare dati da un file CSV utilizzando `pd.read_csv()`
- Possiamo anche caricare dati da un file Excel utilizzando `pd.read_excel()`


### Esempio

In [16]:
# Loading Data
df = pd.read_csv('../../data/shampoo_sales.csv')
df

Unnamed: 0,Date,Sales
0,01-01-2012,266.0
1,01-02-2012,145.9
2,01-03-2012,183.1
3,01-04-2012,119.3
4,01-05-2012,180.3
5,01-06-2012,168.5
6,01-07-2012,231.8
7,01-08-2012,224.5
8,01-09-2012,192.8
9,01-10-2012,122.9


## Dataset

Esistono diversi siti dove recuperare dataset che potete usare.

- Il più famoso è [kaggle.com/datasets](kaggle.com/datasets)

- UCI Machine Learning Repository (archive.ics.uci.edu) è uno storico sito di Dataset classici per classificazione, regressione, clustering.

- Google Dataset Search (datasetsearch.research.google.com) è un motore di ricerca per dataset pubblici

-  Per NLP e AI avanzato (modelli linguistici, traduzione, ecc.): [https://huggingface.c](https://huggingface.co/) 

Alcuni datasets famosi per regressione e classificazione :

- Titanic – Predizione sopravvivenza passeggeri (https://www.kaggle.com/competitions/titanic)

- Ames Housing – Predizione del prezzo delle case (https://www.kaggle.com/datasets/shashanknecrothapa/ames-housing-dataset)

- Iris Dataset – Classificazione di fiori (dataset classico) (https://archive.ics.uci.edu/dataset/53/iris)

Non lo useremo ora ma ad esempio:

```python
    python from sklearn.datasets import load_iris
```

oppure

```python
    import tensorflow_datasets as tfds
    ds, info = tfds.load("mnist", with_info=True)

In [17]:
import seaborn as sns

df = sns.load_dataset("titanic")
print(df.head())

   survived  pclass     sex   age  sibsp  parch     fare embarked  class  \
0         0       3    male  22.0      1      0   7.2500        S  Third   
1         1       1  female  38.0      1      0  71.2833        C  First   
2         1       3  female  26.0      0      0   7.9250        S  Third   
3         1       1  female  35.0      1      0  53.1000        S  First   
4         0       3    male  35.0      0      0   8.0500        S  Third   

     who  adult_male deck  embark_town alive  alone  
0    man        True  NaN  Southampton    no  False  
1  woman       False    C    Cherbourg   yes  False  
2  woman       False  NaN  Southampton   yes   True  
3  woman       False    C  Southampton   yes  False  
4    man        True  NaN  Southampton    no   True  


In [18]:
# data set qualità del vino
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
df = pd.read_csv(url, sep=';') 
print(df.head())


   fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \
0            7.4              0.70         0.00             1.9      0.076   
1            7.8              0.88         0.00             2.6      0.098   
2            7.8              0.76         0.04             2.3      0.092   
3           11.2              0.28         0.56             1.9      0.075   
4            7.4              0.70         0.00             1.9      0.076   

   free sulfur dioxide  total sulfur dioxide  density    pH  sulphates  \
0                 11.0                  34.0   0.9978  3.51       0.56   
1                 25.0                  67.0   0.9968  3.20       0.68   
2                 15.0                  54.0   0.9970  3.26       0.65   
3                 17.0                  60.0   0.9980  3.16       0.58   
4                 11.0                  34.0   0.9978  3.51       0.56   

   alcohol  quality  
0      9.4        5  
1      9.8        5  
2      9.8        5 

In [None]:
# dataset consumo di alcol
df = pd.read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv")
print(df.head())


In [None]:
#covid dataset
df = pd.read_csv("https://covid.ourworldindata.org/data/owid-covid-data.csv")
print(df.head())

In [None]:
%conda install datasets

Posso anche caricare direttamente dei dati da pagine url.
L'importante su github è prendere la versione raw.
Quando sei su GitHub e visualizzi un file  (ad esempio un .csv, .txt, .py…), stai vedendo una pagina HTML di anteprima, non il file vero e proprio.
Cliccando su Raw, GitHub ti porta a una versione “grezza” del file.

In [None]:
url = "https://raw.githubusercontent.com/lauranenzi/ProgrammingLab_II/refs/heads/main/data/shampoo_sales.csv"
df = pd.read_csv(url)
df.head()


In [19]:
from datasets import load_dataset


In [21]:
# Carica il dataset
dataset = load_dataset("yiqing111/Engineering_Jobs_Insight_Dataset")

# Converte in DataFrame Pandas
df = dataset['train'].to_pandas()

df 

Repo card metadata block was not found. Setting CardData to empty.


Unnamed: 0,Job Title,Company,Description,Location,Salary Min,Salary Max,Date Posted,URL
0,Senior Software Engineer (Python),BP Energy,Entity: Trading & Shipping Job Family Group: S...,"Crestwood, Houston",138992.40,138992.40,2024-10-29T16:35:26Z,https://www.adzuna.com/land/ad/4917931721?se=N...
1,Sr. Backend Software Engineer,Meijer,"As a family company, we serve people and commu...","Belmont, Kent County",118638.80,118638.80,2024-11-10T01:13:11Z,https://www.adzuna.com/land/ad/4933370156?se=N...
2,Sr. Software Engineer - Mobile,Meijer,"As a family company, we serve people and commu...","Belmont, Kent County",108041.95,108041.95,2024-10-15T11:51:30Z,https://www.adzuna.com/land/ad/4902683574?se=N...
3,Acquisition Software Engineer,Naval Air Systems Command,Position Description The Harpoon/SLAM ER/JSOW ...,"China Lake, Kern County",88583.57,88583.57,2024-11-16T04:21:41Z,https://www.adzuna.com/land/ad/4941260438?se=N...
4,Senior Software Engineer,Innova,A client of Innova Solutions is immediately hi...,"Richardson, Dallas",121932.35,121932.35,2024-11-15T09:42:55Z,https://www.adzuna.com/details/4940271538?utm_...
...,...,...,...,...,...,...,...,...
11180,Blockchain Developer (The Decentralization Arc...,Unreal Gigs,Do you have a passion for decentralized system...,"San Francisco, California",110000.00,160000.00,2024-10-20T23:57:32Z,https://www.adzuna.com/details/4908400312?utm_...
11181,Blockchain Developer (The Decentralization Arc...,Unreal Gigs,Do you have a passion for decentralized system...,"Austin, Travis County",110000.00,160000.00,2024-10-20T23:57:32Z,https://www.adzuna.com/details/4908400320?utm_...
11182,Blockchain Developer (Rust/Move),Supra,Who We Are Supra is pioneering the future of i...,"Perham, Otter Tail County",126635.63,126635.63,2024-08-01T10:52:59Z,https://www.adzuna.com/details/4805320117?utm_...
11183,Blockchain Ecosystem Developer Advocate - USA ...,Crypto Recruit,Hyperfast blockchain building out ecosystem gl...,US,155731.52,155731.52,2024-11-06T09:13:29Z,https://www.adzuna.com/details/4928326240?utm_...


## Visualizzazione dei dati

- Visualizza le prime righe di dati nel DataFrame utilizzando `head()`.
- Visualizza le ultime righe del DataFrame utilizzando `tail()`.

Questi metodi sono utili per ottenere rapidamente una panoramica del DataFrame. Ideali per l'analisi esplorativa e per comprendere la struttura dei dati.


In [22]:
df.head()

Unnamed: 0,Job Title,Company,Description,Location,Salary Min,Salary Max,Date Posted,URL
0,Senior Software Engineer (Python),BP Energy,Entity: Trading & Shipping Job Family Group: S...,"Crestwood, Houston",138992.4,138992.4,2024-10-29T16:35:26Z,https://www.adzuna.com/land/ad/4917931721?se=N...
1,Sr. Backend Software Engineer,Meijer,"As a family company, we serve people and commu...","Belmont, Kent County",118638.8,118638.8,2024-11-10T01:13:11Z,https://www.adzuna.com/land/ad/4933370156?se=N...
2,Sr. Software Engineer - Mobile,Meijer,"As a family company, we serve people and commu...","Belmont, Kent County",108041.95,108041.95,2024-10-15T11:51:30Z,https://www.adzuna.com/land/ad/4902683574?se=N...
3,Acquisition Software Engineer,Naval Air Systems Command,Position Description The Harpoon/SLAM ER/JSOW ...,"China Lake, Kern County",88583.57,88583.57,2024-11-16T04:21:41Z,https://www.adzuna.com/land/ad/4941260438?se=N...
4,Senior Software Engineer,Innova,A client of Innova Solutions is immediately hi...,"Richardson, Dallas",121932.35,121932.35,2024-11-15T09:42:55Z,https://www.adzuna.com/details/4940271538?utm_...


Puoi anche specificare il numero di righe che vuoi visualizzare.  
Ad esempio, per vedere le prime **10 righe**, usiamo `head(10)`.


In [23]:
df.head(10)

Unnamed: 0,Job Title,Company,Description,Location,Salary Min,Salary Max,Date Posted,URL
0,Senior Software Engineer (Python),BP Energy,Entity: Trading & Shipping Job Family Group: S...,"Crestwood, Houston",138992.4,138992.4,2024-10-29T16:35:26Z,https://www.adzuna.com/land/ad/4917931721?se=N...
1,Sr. Backend Software Engineer,Meijer,"As a family company, we serve people and commu...","Belmont, Kent County",118638.8,118638.8,2024-11-10T01:13:11Z,https://www.adzuna.com/land/ad/4933370156?se=N...
2,Sr. Software Engineer - Mobile,Meijer,"As a family company, we serve people and commu...","Belmont, Kent County",108041.95,108041.95,2024-10-15T11:51:30Z,https://www.adzuna.com/land/ad/4902683574?se=N...
3,Acquisition Software Engineer,Naval Air Systems Command,Position Description The Harpoon/SLAM ER/JSOW ...,"China Lake, Kern County",88583.57,88583.57,2024-11-16T04:21:41Z,https://www.adzuna.com/land/ad/4941260438?se=N...
4,Senior Software Engineer,Innova,A client of Innova Solutions is immediately hi...,"Richardson, Dallas",121932.35,121932.35,2024-11-15T09:42:55Z,https://www.adzuna.com/details/4940271538?utm_...
5,Software Engineering Manager,Softworld Inc,Job Title: 80623 - Software Engineering Manage...,"Bridgeton, Saint Louis County",133348.23,133348.23,2024-11-06T20:44:34Z,https://www.adzuna.com/details/4929157424?utm_...
6,Java Software Engineer,Volt,Move Forward with Volt Volt is immediately hir...,"Aurora, Arapahoe County",88769.2,88769.2,2024-11-21T10:41:29Z,https://www.adzuna.com/details/4946736176?utm_...
7,Java Software Engineer,Volt,Move Forward with Volt Volt is immediately hir...,"Murphy, Collin County",79830.78,79830.78,2024-11-21T10:41:27Z,https://www.adzuna.com/details/4946736156?utm_...
8,Principal Software Engineer,Volt,Move Forward with Volt Volt is immediately hir...,"Murphy, Collin County",94173.15,94173.15,2024-11-09T09:54:47Z,https://www.adzuna.com/details/4932744306?utm_...
9,Software Engineer- .net,Schneider Electric,"For this U.S. based position, the expected com...","Louisville, Jefferson County",91324.85,91324.85,2024-11-14T02:51:37Z,https://www.adzuna.com/land/ad/4938501621?se=N...


In [24]:
df.tail()

Unnamed: 0,Job Title,Company,Description,Location,Salary Min,Salary Max,Date Posted,URL
11180,Blockchain Developer (The Decentralization Arc...,Unreal Gigs,Do you have a passion for decentralized system...,"San Francisco, California",110000.0,160000.0,2024-10-20T23:57:32Z,https://www.adzuna.com/details/4908400312?utm_...
11181,Blockchain Developer (The Decentralization Arc...,Unreal Gigs,Do you have a passion for decentralized system...,"Austin, Travis County",110000.0,160000.0,2024-10-20T23:57:32Z,https://www.adzuna.com/details/4908400320?utm_...
11182,Blockchain Developer (Rust/Move),Supra,Who We Are Supra is pioneering the future of i...,"Perham, Otter Tail County",126635.63,126635.63,2024-08-01T10:52:59Z,https://www.adzuna.com/details/4805320117?utm_...
11183,Blockchain Ecosystem Developer Advocate - USA ...,Crypto Recruit,Hyperfast blockchain building out ecosystem gl...,US,155731.52,155731.52,2024-11-06T09:13:29Z,https://www.adzuna.com/details/4928326240?utm_...
11184,Backend Developer - Java JVM Blockchain,Crypto Recruit,US HQ Blockchain seeking Java JVM Senior Devs ...,US,174527.5,174527.5,2024-11-06T09:13:29Z,https://www.adzuna.com/details/4928326234?utm_...


In [25]:
df.tail(10)

Unnamed: 0,Job Title,Company,Description,Location,Salary Min,Salary Max,Date Posted,URL
11175,Architect - Microsoft Security Solutions,DGR Systems LLC,"DGR Systems, a growing premier technology cons...","Tampa, Hillsborough County",100000.0,150000.0,2024-09-29T13:50:19Z,https://www.adzuna.com/details/4881427957?utm_...
11176,Principal Business Architect - Health Insuranc...,Molina Enterprise,Description Job Description Job Summary Critic...,"Arizona, US",173759.23,173759.23,2024-11-11T05:24:32Z,https://www.adzuna.com/details/4934323040?utm_...
11177,Functional Solutions Consultant - NetSuite ACS...,CLBPTS,Description Do you want to advance your career...,US,115700.0,189500.0,2024-11-16T05:27:41Z,https://www.adzuna.com/details/4941274924?utm_...
11178,Functional Solutions Consultant - NetSuite ACS...,CLBPTS,Description Do you want to advance your career...,US,88100.0,192000.0,2024-09-01T06:11:35Z,https://www.adzuna.com/details/4847225282?utm_...
11179,Blockchain Developer (The Decentralization Arc...,Unreal Gigs,Do you have a passion for decentralized system...,"New York City, New York",110000.0,160000.0,2024-10-20T23:57:32Z,https://www.adzuna.com/details/4908400317?utm_...
11180,Blockchain Developer (The Decentralization Arc...,Unreal Gigs,Do you have a passion for decentralized system...,"San Francisco, California",110000.0,160000.0,2024-10-20T23:57:32Z,https://www.adzuna.com/details/4908400312?utm_...
11181,Blockchain Developer (The Decentralization Arc...,Unreal Gigs,Do you have a passion for decentralized system...,"Austin, Travis County",110000.0,160000.0,2024-10-20T23:57:32Z,https://www.adzuna.com/details/4908400320?utm_...
11182,Blockchain Developer (Rust/Move),Supra,Who We Are Supra is pioneering the future of i...,"Perham, Otter Tail County",126635.63,126635.63,2024-08-01T10:52:59Z,https://www.adzuna.com/details/4805320117?utm_...
11183,Blockchain Ecosystem Developer Advocate - USA ...,Crypto Recruit,Hyperfast blockchain building out ecosystem gl...,US,155731.52,155731.52,2024-11-06T09:13:29Z,https://www.adzuna.com/details/4928326240?utm_...
11184,Backend Developer - Java JVM Blockchain,Crypto Recruit,US HQ Blockchain seeking Java JVM Senior Devs ...,US,174527.5,174527.5,2024-11-06T09:13:29Z,https://www.adzuna.com/details/4928326234?utm_...


## Ottenere Informazioni

Per ottenere un riepilogo conciso del DataFrame utilizzeremo `df.info()`.

Questo ci fornisce:
- Numero totale di elementi
- Numero di colonne
- Nome di ciascuna colonna
- Conteggio degli elementi non nulli per ogni colonna
- Tipo di dato di ciascuna colonna

Utile per esplorare i dati e ottenere una panoramica veloce del dataset.

### Esempio

Utilizziamo `info()` sul nostro DataFrame.

**Nota**: Nel nostro DataFrame, la colonna `job_posted_date` non è un oggetto datetime, ma lo correggeremo più avanti.


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11185 entries, 0 to 11184
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Job Title    11185 non-null  object 
 1   Company      11181 non-null  object 
 2   Description  11185 non-null  object 
 3   Location     11185 non-null  object 
 4   Salary Min   11185 non-null  float64
 5   Salary Max   11179 non-null  float64
 6   Date Posted  11185 non-null  object 
 7   URL          11185 non-null  object 
dtypes: float64(2), object(6)
memory usage: 699.2+ KB


`df.describe()` mi da statistiche sulle colonne numeriche

In [27]:
df.describe()

Unnamed: 0,Salary Min,Salary Max
count,11185.0,11179.0
mean,110393.745941,123339.933299
std,40847.621742,55571.000358
min,0.0,1.0
25%,83130.58,85820.28
50%,105012.47,113068.15
75%,133000.0,147635.98
max,550000.0,809123.0


In [28]:
df['Job Title'].unique()

array(['Senior Software Engineer (Python)',
       'Sr. Backend Software Engineer', 'Sr. Software Engineer - Mobile',
       ..., 'Blockchain Developer (Rust/Move)',
       'Blockchain Ecosystem Developer Advocate - USA Canada',
       'Backend Developer - Java JVM Blockchain'],
      shape=(4848,), dtype=object)

Ma esistono anche altri modi per ottenere una panoramica del DataFrame, come:

- `len(df)` - Ottiene la lunghezza del DataFrame (numero di righe).
- `df.shape` - Restituisce una tupla con il numero di righe e colonne `(righe, colonne)`.
- `df.index` - Descrive l'indice del DataFrame.
- `df.columns` - Elenca i nomi delle colonne del DataFrame.
- `df.count()` - Conta il numero di valori **non nulli** per ogni colonna.
- `df['column_name'].unique()` - Restituisce i valori distinti presenti in una colonna.


## Accesso ai dati

### Variabili (Colonne)

- Nei DataFrame, le colonne sono chiamate **variabili**.
- Per visualizzare una colonna specifica, puoi usare `df['column_name']` o `df.column_name`.
- Questo è utile se vuoi esaminare solo alcune colonne del DataFrame, specialmente se il dataset è molto grande.

#### Esempio


In [29]:
df.columns

Index(['Job Title', 'Company', 'Description', 'Location', 'Salary Min',
       'Salary Max', 'Date Posted', 'URL'],
      dtype='object')

In [None]:
df.Job Title

In [30]:
df.columns = df.columns.str.replace(' ', '_')

In [31]:
df.Job_Title

0                        Senior Software Engineer (Python)
1                            Sr. Backend Software Engineer
2                           Sr. Software Engineer - Mobile
3                            Acquisition Software Engineer
4                                 Senior Software Engineer
                               ...                        
11180    Blockchain Developer (The Decentralization Arc...
11181    Blockchain Developer (The Decentralization Arc...
11182                     Blockchain Developer (Rust/Move)
11183    Blockchain Ecosystem Developer Advocate - USA ...
11184              Backend Developer - Java JVM Blockchain
Name: Job_Title, Length: 11185, dtype: object

E se volessi visualizzare più colonne contemporaneamente?

Puoi elencare i nomi delle colonne all'interno di un'altra parentesi quadra, in questo modo:


In [32]:
df[['Job_Title', 'Location']]

Unnamed: 0,Job_Title,Location
0,Senior Software Engineer (Python),"Crestwood, Houston"
1,Sr. Backend Software Engineer,"Belmont, Kent County"
2,Sr. Software Engineer - Mobile,"Belmont, Kent County"
3,Acquisition Software Engineer,"China Lake, Kern County"
4,Senior Software Engineer,"Richardson, Dallas"
...,...,...
11180,Blockchain Developer (The Decentralization Arc...,"San Francisco, California"
11181,Blockchain Developer (The Decentralization Arc...,"Austin, Travis County"
11182,Blockchain Developer (Rust/Move),"Perham, Otter Tail County"
11183,Blockchain Ecosystem Developer Advocate - USA ...,US


### Osservazioni (Righe)

* Nei DataFrame, le righe sono chiamate **osservazioni**.
* Per visualizzare le righe in base all'indice, puoi usare `iloc[]`, che sta per "integer location"

#### Esempio


In [33]:
df.iloc[1]

Job_Title                          Sr. Backend Software Engineer
Company                                                   Meijer
Description    As a family company, we serve people and commu...
Location                                    Belmont, Kent County
Salary_Min                                              118638.8
Salary_Max                                              118638.8
Date_Posted                                 2024-11-10T01:13:11Z
URL            https://www.adzuna.com/land/ad/4933370156?se=N...
Name: 1, dtype: object

È corretto?  
Utilizziamo `df.head()` per visualizzare le prime 5 righe e confermare che la seconda riga (con indice 1) sia effettivamente quella giusta.


In [34]:
df.head()

Unnamed: 0,Job_Title,Company,Description,Location,Salary_Min,Salary_Max,Date_Posted,URL
0,Senior Software Engineer (Python),BP Energy,Entity: Trading & Shipping Job Family Group: S...,"Crestwood, Houston",138992.4,138992.4,2024-10-29T16:35:26Z,https://www.adzuna.com/land/ad/4917931721?se=N...
1,Sr. Backend Software Engineer,Meijer,"As a family company, we serve people and commu...","Belmont, Kent County",118638.8,118638.8,2024-11-10T01:13:11Z,https://www.adzuna.com/land/ad/4933370156?se=N...
2,Sr. Software Engineer - Mobile,Meijer,"As a family company, we serve people and commu...","Belmont, Kent County",108041.95,108041.95,2024-10-15T11:51:30Z,https://www.adzuna.com/land/ad/4902683574?se=N...
3,Acquisition Software Engineer,Naval Air Systems Command,Position Description The Harpoon/SLAM ER/JSOW ...,"China Lake, Kern County",88583.57,88583.57,2024-11-16T04:21:41Z,https://www.adzuna.com/land/ad/4941260438?se=N...
4,Senior Software Engineer,Innova,A client of Innova Solutions is immediately hi...,"Richardson, Dallas",121932.35,121932.35,2024-11-15T09:42:55Z,https://www.adzuna.com/details/4940271538?utm_...


Per visualizzare più righe contemporaneamente, possiamo usare `iloc[]` con più indici elencati.  

Ad esempio, per visualizzare le righe **dalla 2ª alla 5ª** (indice **1** a **4**), usiamo `[1:5]`.  

**Attenzione**: il secondo numero dell'intervallo **non è incluso**.  
Quindi, `[1:5]` restituirà le righe con indice **1, 2, 3 e 4**, escludendo l'indice **5**.

```python
df.iloc[1:5]


In [35]:
df.iloc[2:5]

Unnamed: 0,Job_Title,Company,Description,Location,Salary_Min,Salary_Max,Date_Posted,URL
2,Sr. Software Engineer - Mobile,Meijer,"As a family company, we serve people and commu...","Belmont, Kent County",108041.95,108041.95,2024-10-15T11:51:30Z,https://www.adzuna.com/land/ad/4902683574?se=N...
3,Acquisition Software Engineer,Naval Air Systems Command,Position Description The Harpoon/SLAM ER/JSOW ...,"China Lake, Kern County",88583.57,88583.57,2024-11-16T04:21:41Z,https://www.adzuna.com/land/ad/4941260438?se=N...
4,Senior Software Engineer,Innova,A client of Innova Solutions is immediately hi...,"Richardson, Dallas",121932.35,121932.35,2024-11-15T09:42:55Z,https://www.adzuna.com/details/4940271538?utm_...


In [36]:
df.iloc[2:5, 0:2]

Unnamed: 0,Job_Title,Company
2,Sr. Software Engineer - Mobile,Meijer
3,Acquisition Software Engineer,Naval Air Systems Command
4,Senior Software Engineer,Innova


Possiamo **sotto-selezionare** (subset) le righe di una `Series` o di un `DataFrame`  
utilizzando i **valori dell'indice** associati a ciascuna riga tramite la funzione `loc`.
I valori dell'indice possono essere nomi o numeri, l'id è assocciato alla riga e rimarrà sempre uguale. 
`iloc` invece ritorna la posizione numerica indipendentemente da quale numero rappresenta l'indice.

```python
df.loc[indice] 
``` 
restituisce la riga (o le righe) il cui indice corrisponde al valore specificato.

#### Filtrare le Righe

* Puoi filtrare le righe applicando una condizione all'interno delle parentesi quadre `[]`.  
* La sintassi è: `df[df['column_name'] > valore]`.  
    * L'operatore `>` può essere sostituito con qualsiasi altro operatore condizionale, come: `>`, `<`, `==`, `!=`, ecc.

##### Esempio

Filtriamo solo le righe in cui `Salary_Min` è maggiore di **100000**:


In [37]:
df['Salary_Min'] > 100000

0         True
1         True
2         True
3        False
4         True
         ...  
11180     True
11181     True
11182     True
11183     True
11184     True
Name: Salary_Min, Length: 11185, dtype: bool

In [38]:
df[df['Salary_Min'] > 100000]

Unnamed: 0,Job_Title,Company,Description,Location,Salary_Min,Salary_Max,Date_Posted,URL
0,Senior Software Engineer (Python),BP Energy,Entity: Trading & Shipping Job Family Group: S...,"Crestwood, Houston",138992.40,138992.40,2024-10-29T16:35:26Z,https://www.adzuna.com/land/ad/4917931721?se=N...
1,Sr. Backend Software Engineer,Meijer,"As a family company, we serve people and commu...","Belmont, Kent County",118638.80,118638.80,2024-11-10T01:13:11Z,https://www.adzuna.com/land/ad/4933370156?se=N...
2,Sr. Software Engineer - Mobile,Meijer,"As a family company, we serve people and commu...","Belmont, Kent County",108041.95,108041.95,2024-10-15T11:51:30Z,https://www.adzuna.com/land/ad/4902683574?se=N...
4,Senior Software Engineer,Innova,A client of Innova Solutions is immediately hi...,"Richardson, Dallas",121932.35,121932.35,2024-11-15T09:42:55Z,https://www.adzuna.com/details/4940271538?utm_...
5,Software Engineering Manager,Softworld Inc,Job Title: 80623 - Software Engineering Manage...,"Bridgeton, Saint Louis County",133348.23,133348.23,2024-11-06T20:44:34Z,https://www.adzuna.com/details/4929157424?utm_...
...,...,...,...,...,...,...,...,...
11180,Blockchain Developer (The Decentralization Arc...,Unreal Gigs,Do you have a passion for decentralized system...,"San Francisco, California",110000.00,160000.00,2024-10-20T23:57:32Z,https://www.adzuna.com/details/4908400312?utm_...
11181,Blockchain Developer (The Decentralization Arc...,Unreal Gigs,Do you have a passion for decentralized system...,"Austin, Travis County",110000.00,160000.00,2024-10-20T23:57:32Z,https://www.adzuna.com/details/4908400320?utm_...
11182,Blockchain Developer (Rust/Move),Supra,Who We Are Supra is pioneering the future of i...,"Perham, Otter Tail County",126635.63,126635.63,2024-08-01T10:52:59Z,https://www.adzuna.com/details/4805320117?utm_...
11183,Blockchain Ecosystem Developer Advocate - USA ...,Crypto Recruit,Hyperfast blockchain building out ecosystem gl...,US,155731.52,155731.52,2024-11-06T09:13:29Z,https://www.adzuna.com/details/4928326240?utm_...


Se vuoi visualizzare solo le righe in cui il titolo del lavoro (`job_title`) è **"Senior Software Engineer"**, puoi usare l'operatore `==` per filtrare i dati.

In [39]:
df[df['Job_Title'] == 'Senior Software Engineer']

Unnamed: 0,Job_Title,Company,Description,Location,Salary_Min,Salary_Max,Date_Posted,URL
4,Senior Software Engineer,Innova,A client of Innova Solutions is immediately hi...,"Richardson, Dallas",121932.35,121932.35,2024-11-15T09:42:55Z,https://www.adzuna.com/details/4940271538?utm_...
434,Senior Software Engineer,"Komodo Co., Ltd.",About KOMODO KOMODO works on products that sha...,"Honolulu, Hawaii",167803.20,167803.20,2024-01-03T14:47:03Z,https://www.adzuna.com/details/4508854608?utm_...
436,Senior Software Engineer,Tech Firefly,Tech Firefly is teaming up with a deep learnin...,US,170000.00,220000.00,2024-03-01T16:23:33Z,https://www.adzuna.com/details/4588945381?utm_...
439,Senior Software Engineer,GrowthBook,"About GrowthBook At GrowthBook, we are buildin...","Palo Alto, Santa Clara County",174824.69,174824.69,2023-11-14T20:44:48Z,https://www.adzuna.com/details/4433358852?utm_...
440,Senior Software Engineer,Aviture,What is Aviture? Aviture provides custom softw...,"Omaha, Douglas County",147484.70,147484.70,2023-12-26T17:27:22Z,https://www.adzuna.com/details/4497539636?utm_...
...,...,...,...,...,...,...,...,...
1485,Senior Software Engineer,Antares,"About Us At Antares, our long-term mission is ...","Los Angeles, Los Angeles County",175634.50,175634.50,2024-08-01T11:10:25Z,https://www.adzuna.com/details/4805420051?utm_...
1488,Senior Software Engineer,Ludus,Senior Software Engineer We are looking for a ...,US,130.00,150.00,2024-11-16T06:15:03Z,https://www.adzuna.com/details/4941323191?utm_...
1490,Senior Software Engineer,The Walt Disney Company,Job Posting Title: Senior Software Engineer Re...,"San Francisco, California",149000.00,199800.00,2024-09-22T06:13:13Z,https://www.adzuna.com/details/4872599438?utm_...
1493,Senior Software Engineer,Parsons Technical Services,"In a world of possibilities, pursue one with e...","Colorado Springs, El Paso County",104200.00,182400.00,2024-09-30T07:37:21Z,https://www.adzuna.com/details/4882015618?utm_...


Per condizioni ligiche più complesse non si usano `and` e `or` ma si usano `&` e `|`

In [None]:
df[(df.Job_Title == 'Senior Software Engineer') & (df.Salary_Min > 100000)]

In [None]:
df[(df.Job_Title == 'Senior Software Engineer') | (df.Salary_Min > 100000)]

Mostriamo in una figura i diversi metodi

![image.png](attachment:image.png)

## Trovare Valori Non NA

* `pd.notna()` - mostra se i valori **non** sono NA (valori mancanti).
* Restituisce un oggetto Booleano della stessa dimensione che indica se i valori **non** sono NA.
    * I valori **non mancanti** sono `True`
    * I valori **mancanti** sono `False`
* Utile per la **pre-elaborazione dei dati**, per rimuovere o riempire i valori mancanti, o per prendere decisioni basate sulla presenza dei dati.


In [42]:
df.Salary_Max.notna()

0        True
1        True
2        True
3        True
4        True
         ... 
11180    True
11181    True
11182    True
11183    True
11184    True
Name: Salary_Max, Length: 11185, dtype: bool

Se vogliamo ottenere solo le righe senza valori mancanti in una colonna specifica, possiamo usare:

In [43]:
df[df['Salary_Max'].notna()]

Unnamed: 0,Job_Title,Company,Description,Location,Salary_Min,Salary_Max,Date_Posted,URL
0,Senior Software Engineer (Python),BP Energy,Entity: Trading & Shipping Job Family Group: S...,"Crestwood, Houston",138992.40,138992.40,2024-10-29T16:35:26Z,https://www.adzuna.com/land/ad/4917931721?se=N...
1,Sr. Backend Software Engineer,Meijer,"As a family company, we serve people and commu...","Belmont, Kent County",118638.80,118638.80,2024-11-10T01:13:11Z,https://www.adzuna.com/land/ad/4933370156?se=N...
2,Sr. Software Engineer - Mobile,Meijer,"As a family company, we serve people and commu...","Belmont, Kent County",108041.95,108041.95,2024-10-15T11:51:30Z,https://www.adzuna.com/land/ad/4902683574?se=N...
3,Acquisition Software Engineer,Naval Air Systems Command,Position Description The Harpoon/SLAM ER/JSOW ...,"China Lake, Kern County",88583.57,88583.57,2024-11-16T04:21:41Z,https://www.adzuna.com/land/ad/4941260438?se=N...
4,Senior Software Engineer,Innova,A client of Innova Solutions is immediately hi...,"Richardson, Dallas",121932.35,121932.35,2024-11-15T09:42:55Z,https://www.adzuna.com/details/4940271538?utm_...
...,...,...,...,...,...,...,...,...
11180,Blockchain Developer (The Decentralization Arc...,Unreal Gigs,Do you have a passion for decentralized system...,"San Francisco, California",110000.00,160000.00,2024-10-20T23:57:32Z,https://www.adzuna.com/details/4908400312?utm_...
11181,Blockchain Developer (The Decentralization Arc...,Unreal Gigs,Do you have a passion for decentralized system...,"Austin, Travis County",110000.00,160000.00,2024-10-20T23:57:32Z,https://www.adzuna.com/details/4908400320?utm_...
11182,Blockchain Developer (Rust/Move),Supra,Who We Are Supra is pioneering the future of i...,"Perham, Otter Tail County",126635.63,126635.63,2024-08-01T10:52:59Z,https://www.adzuna.com/details/4805320117?utm_...
11183,Blockchain Ecosystem Developer Advocate - USA ...,Crypto Recruit,Hyperfast blockchain building out ecosystem gl...,US,155731.52,155731.52,2024-11-06T09:13:29Z,https://www.adzuna.com/details/4928326240?utm_...


In [44]:
df[(df.Job_Title == 'Senior Software Engineer') & (df.Salary_Max.notna())]

Unnamed: 0,Job_Title,Company,Description,Location,Salary_Min,Salary_Max,Date_Posted,URL
4,Senior Software Engineer,Innova,A client of Innova Solutions is immediately hi...,"Richardson, Dallas",121932.35,121932.35,2024-11-15T09:42:55Z,https://www.adzuna.com/details/4940271538?utm_...
434,Senior Software Engineer,"Komodo Co., Ltd.",About KOMODO KOMODO works on products that sha...,"Honolulu, Hawaii",167803.20,167803.20,2024-01-03T14:47:03Z,https://www.adzuna.com/details/4508854608?utm_...
436,Senior Software Engineer,Tech Firefly,Tech Firefly is teaming up with a deep learnin...,US,170000.00,220000.00,2024-03-01T16:23:33Z,https://www.adzuna.com/details/4588945381?utm_...
439,Senior Software Engineer,GrowthBook,"About GrowthBook At GrowthBook, we are buildin...","Palo Alto, Santa Clara County",174824.69,174824.69,2023-11-14T20:44:48Z,https://www.adzuna.com/details/4433358852?utm_...
440,Senior Software Engineer,Aviture,What is Aviture? Aviture provides custom softw...,"Omaha, Douglas County",147484.70,147484.70,2023-12-26T17:27:22Z,https://www.adzuna.com/details/4497539636?utm_...
...,...,...,...,...,...,...,...,...
1485,Senior Software Engineer,Antares,"About Us At Antares, our long-term mission is ...","Los Angeles, Los Angeles County",175634.50,175634.50,2024-08-01T11:10:25Z,https://www.adzuna.com/details/4805420051?utm_...
1488,Senior Software Engineer,Ludus,Senior Software Engineer We are looking for a ...,US,130.00,150.00,2024-11-16T06:15:03Z,https://www.adzuna.com/details/4941323191?utm_...
1490,Senior Software Engineer,The Walt Disney Company,Job Posting Title: Senior Software Engineer Re...,"San Francisco, California",149000.00,199800.00,2024-09-22T06:13:13Z,https://www.adzuna.com/details/4872599438?utm_...
1493,Senior Software Engineer,Parsons Technical Services,"In a world of possibilities, pursue one with e...","Colorado Springs, El Paso County",104200.00,182400.00,2024-09-30T07:37:21Z,https://www.adzuna.com/details/4882015618?utm_...


Puoi applicare questa funzione a più colonne contemporaneamente.  
Ad esempio, per restituire solo le righe in cui **`Salary_Max`** e **`job_skills`** **non** sono nulli, usa:

In [45]:
df[df[['Salary_Max', 'Company']].notna().all(axis=1)]

Unnamed: 0,Job_Title,Company,Description,Location,Salary_Min,Salary_Max,Date_Posted,URL
0,Senior Software Engineer (Python),BP Energy,Entity: Trading & Shipping Job Family Group: S...,"Crestwood, Houston",138992.40,138992.40,2024-10-29T16:35:26Z,https://www.adzuna.com/land/ad/4917931721?se=N...
1,Sr. Backend Software Engineer,Meijer,"As a family company, we serve people and commu...","Belmont, Kent County",118638.80,118638.80,2024-11-10T01:13:11Z,https://www.adzuna.com/land/ad/4933370156?se=N...
2,Sr. Software Engineer - Mobile,Meijer,"As a family company, we serve people and commu...","Belmont, Kent County",108041.95,108041.95,2024-10-15T11:51:30Z,https://www.adzuna.com/land/ad/4902683574?se=N...
3,Acquisition Software Engineer,Naval Air Systems Command,Position Description The Harpoon/SLAM ER/JSOW ...,"China Lake, Kern County",88583.57,88583.57,2024-11-16T04:21:41Z,https://www.adzuna.com/land/ad/4941260438?se=N...
4,Senior Software Engineer,Innova,A client of Innova Solutions is immediately hi...,"Richardson, Dallas",121932.35,121932.35,2024-11-15T09:42:55Z,https://www.adzuna.com/details/4940271538?utm_...
...,...,...,...,...,...,...,...,...
11180,Blockchain Developer (The Decentralization Arc...,Unreal Gigs,Do you have a passion for decentralized system...,"San Francisco, California",110000.00,160000.00,2024-10-20T23:57:32Z,https://www.adzuna.com/details/4908400312?utm_...
11181,Blockchain Developer (The Decentralization Arc...,Unreal Gigs,Do you have a passion for decentralized system...,"Austin, Travis County",110000.00,160000.00,2024-10-20T23:57:32Z,https://www.adzuna.com/details/4908400320?utm_...
11182,Blockchain Developer (Rust/Move),Supra,Who We Are Supra is pioneering the future of i...,"Perham, Otter Tail County",126635.63,126635.63,2024-08-01T10:52:59Z,https://www.adzuna.com/details/4805320117?utm_...
11183,Blockchain Ecosystem Developer Advocate - USA ...,Crypto Recruit,Hyperfast blockchain building out ecosystem gl...,US,155731.52,155731.52,2024-11-06T09:13:29Z,https://www.adzuna.com/details/4928326240?utm_...
