<center>
<img src="./images/strikes_cover.png" alt="cover image" width="800">
</center>

## 🧭 General Details

**Domain**: public transport strikes  
**Market location**: Milan (Italy)  
**Company**: [A.T.M.](https://en.wikipedia.org/wiki/Azienda_Trasporti_Milanesi) (Milan's local trasport company)  
**Stakeholders**: customers  
**Analysis type**: descriptive  

## 🎯 Scope of Work
The lack of structured analysis makes it difficult to predict the potential impact of a strike and estimate the partecipation level (**business problem**). The purpose of this analysis is to provide public transport's customers with a clear view of strike's partecipation, based on official historical data (**analysis goal**) provided directly by the local transport company, highlighting patterns and trends. The main **derivable** that qualify the analysis as succesfull will be visualizations, tailored on the stakeholders like social media post (3 post with image, labels and charts, along with caption that describes the insights). The final output will be in italian.

## ❓ Business Questions
Within this analysis, the dataset (see the next section 'Data Preparation') directly inspired the questions below, that follow a **SMART methodology** (Specific, Measurable, Action-oriented, Relevant, Time-Bound). Note that the standard **time-bound** reference period for analysis is last 5 years. Trend analyses may use data from 2016–2023 when historical depth is needed. The standard reference period may vary according to the findings (eg. Covid in 2020 may skewd data) and the last available year may not be complete.

<br>

- **Q1**: When do strikes happen, and how often?
- **Q2**: Which unions are involved and how do they affect me?
- **Q3**: How disruptive are strikes, and is it changing over time?

<br>

| | Business Problem | Business Question |
| --- | --- | --- |
| `Q1.1` | I want to understand how frequently my service is affected | _How many transport strikes happened in Milan?_ |
| `Q1.2` | I’d like to spot patterns so I can plan accordingly | _Which months and weekdays had the highest number of strikes?_ |
| `Q2.1` | I want to know who’s mainly responsible for disruptions | _Which unions declared the most strikes?_ |
| `Q2.2` | I’d like to estimate how serious the strike could be by union's names | _Which unions caused the highest average participation rate from ATM workers during strikes?_ |
| `Q2.3` | I want to understand which unions tend to act as part of broader coalitions | _Which unions most frequently strike in coordination with others?_ |
| `Q3.1` | I want to know if strikes are becoming more impactful or less | _How has the average employee participation rate changed each year?_ |
| `Q3.2` | I want to prepare for more serious disruptions in specific seasons | _Which months had the highest average participation in strikes?_ |
| `Q3.3` | I want to assess whether coordinated strikes are more disruptive | _Do strikes involving more than one union result in higher participation?_ |


# 🪪 Data Preparation
<center>
<img src="./images/dataset_preview.png" alt="cover image" width="800">
</center>

<br>
    
## 📁 Data Source Details

- **Provider**: Azienda Trasporti Milanesi (ATM)  | [URL](https://www.atm.it/it/IlGruppo/personale/Pagine/Adesionescioperi.aspx)
- **Format**: HTML table format | long data
- **Period covered**: 2016–2025  

## 📌 Data Source Origins, Credibility & Bias

- The dataset combines **first-party** data (e.g., participation rates observed directly by ATM) and **second-party** data (e.g., strike declarations communicated by unions).
- Evaluated using the **R.O.C.C. framework**:
  - **Reliable**: Sourced from official platforms  
  - **Original**: Data collected and published by ATM  
  - **Comprehensive**: Includes major fields relevant to strike events  
  - **Current**: Continuously updated  
  - **Cited**: Source acknowledged and traceable  
- **Bias check**: No evident manipulation or selection bias detected. Strike reporting appears systematic.


## 🔐 Data Ethics, Privacy & Security

- No privacy risks: dataset is **publicly accessible** and **aggregated** (no personal data).
- **Security controls**, **data lifecycle policies**, and **access permissions** are **not applicable** due to the dataset's open nature.
- Terms of use and legal conditions are detailed on ATM's [official policy page](https://www.atm.it/it/Pagine/CondizioniDUso.aspx).


## ❓ How data has been collected

The dataset lacks metadata on collection methodology. The following is an inferred reconstruction:

1. Strike announcement submitted by unions to the [Italian Transport Ministry](https://scioperi.mit.gov.it/mit2/public/scioperi)  
2. Ministry and/or unions notify the transport company (ATM)  
3. A.T.M. records the strike event and measures **employee participation** * 

*_Assumption: The participation percentage likely **excludes** employees who were absent due to other justified reasons (e.g., sick leave). Confirmation from ATM would improve data interpretation._


## 🔍 How is data organized

| Field                      | Description                                                     |
|--------------------------- |---------------------------------------------------------------- |
| `date`                     | Date of the strike (datetime format)                            |
| `unions`                   | Union(s) involved (string; may contain multiple values)         |
| `employee_partecipation`   | % of ATM workers who participated (float, 0–100)                |

<br>

## 🗃️ Data Assumptions & Limitations

- Scope: Only official strikes affecting ATM
- Participation: Aggregated across all unions per event
- Data Gaps: No timing details for partial-day strikes, potential inconsistencies in text fields


## 🛠️ How we will use the data

Based on the business questions and the available data, **we build the necessary metrics** to answer them. Below are the key metrics along with their calculations and suggested visualizations.

| Questions | Metric | Description |
| --- | --- | --- |
| `Q1.1` | Yearly Strike Volume | _Total number of strike events recorded per year_ |
| `Q1.2` | Monthly Seasonality | _Aggregated strike counts per calendar month, highlighting seasonal patterns_ |
| `Q1.3` | Weekday Distribution | _Frequency of strikes per day of the week, useful for weekly planning_ |
| `Q2.1` | Top Active Unions | _Count of total strike events attributed to each union_ |
| `Q2.2` | Union-Specific Participation | Average employee participation per union across their declared strikes |
| `Q2.3` | Union Coalition Tendency | For each union, average number of other unions involved in shared strikes (proxy for coalition behavior) |
| `Q3.1` | Annual Disruption Trend | _Year-over-year average employee participation in strikes (proxy for impact severity)_ |
| `Q3.2` | Seasonal Intensity | Monthly averages of participation rates across years to detect high-impact seasons |
| `Q3.3` | Union Count vs. Participation Relationship | _Correlation between number of unions per strike and participation levels to assess whether multi-union strikes are more impactful_ |


# 🛠️ Data Transformation
<center>
<img src="./images/query_dependencies.png" alt="cover image" width="800">
</center>
    
## 📥 Data Import & Extraction  
1. **Data import**: Imported the dataset directly from the official website via URL ([source link](https://www.atm.it/it/IlGruppo/personale/Pagine/Adesionescioperi.aspx))  
2. **Dataset extraction**: Extracted data from the 'Table 4' (renamed raw_data) for transformation in Power Query --> 'raw_data' query

## 🔍 Data Integrity

- **Data accuracy**: We trust the dataset's **accuracy**, **completeness**, and **consistency** based on initial exploration.  
- **Data life cycle**: Not required, as the dataset is **public** and its integrity has been confirmed.  
- **Trust level**: Data is reliable for the scope of this analysis.


## 🧹 Data Cleaning

The cleaning process was executed using **Power Query** in **Microsoft Excel**, transforming the original 'raw_data' into 'clean_data'.

### Cleaning Operations Summary

| Column/Feature | Procedure | Details |
|----------------|-----------|---------|
| Column Headers | Renamed | `Data → Date`, `Proclamante → Unions`, `Percentuale di adesione → Partecipation` |
| Data Rows | Filtered | Removed first row (header row mistakenly included in data) |
| Participation | Value Imputation | Added 67% for 08/11/2024 strike based on service hours ratio |
| Participation | Format Standardization | Removed % symbols, standardized decimal format, rounded to integers |
| Unions | Text Standardization | Converted to uppercase, removed dots and unnecessary prefixes |
| Unions | Separator Standardization | Replaced various separators (`;`, ` - `, ` E `, etc.) with commas |
| Unions | Manual Corrections | Fixed 5 specific records with inconsistent formats (24/02/2025, 29/11/2024, etc.) |
| New Features | Added Columns | Created 'Day of Week' and 'Union Count' columns for analysis |


## 🔄 Data Transformation

We created two additional reference queries ('processed_data' and 'exploded_data') in order to have ready to analyze data. Note that the exploded_data query takes the processed_data and split the union by delimiter in multiple rows.

| Type | Feature Added | Implementation | Purpose |
|------|--------------|----------------|---------|
| Temporal | Weekday Column | Custom column using `Date.DayOfWeekName([Date])` | Enable day-of-week pattern analysis |
| Analytical | Union Count | Custom column using `List.Count(Text.Split([Unions], ","))` | Quantify number of unions per strike |
| Structural | Exploded Union Rows | Split rows by comma delimiter | Create one row per union for individual analysis |
| Categorical | Main Union Classification | Applied union categorization logic | Group unions into meaningful categories |

> **Strong assumption**: since the unions are various and different, we want to categorize each acronym into main categories in order to simplify the hierarchy. The criteria used within the categoriazion is a replacement of the acronym's name into the broader union that is linked to each acronym (eg. FILT-CGIL and AL COBAS are a sub-union respectively of CGIL and COBAS). The `Main Unions` used for the classifications are the following, while the other unions will be categorized as 'OTHER'.

| Confederations | Autonomous Confederations | Base Unions |
|----------------|---------------------------|-------------|
| CGIL, CISL, UIL, UGL | CUB, USB, CONFAIL, CISAL | COBAS, SGB, ORSA |




# 📖 Analysis Results

We will use Excel to group, filter, visualize and find insights about the available data, accordingly to the main questions. The **full Excel file ('analysis.xlsx') is avaialbe** in the main folder.

<center>
<img src="./images/strikes_per_year.png" alt="cover image" width="500">
</center>

<center>
<img src="./images/strikes_per_month.png" alt="cover image" width="500">
</center>

<center>
<img src="./images/strikes_per_day.png" alt="cover image" width="500">
</center>


<center>
<img src="./images/strikes_per_union.png" alt="cover image" width="600">
</center>

<center>
<img src="./images/partecipation_per_union.png" alt="cover image" width="600">
</center>

<center>
<img src="./images/union_count.png" alt="cover image" width="600">
</center>


<center>
<img src="./images/partecipation_per_year.png" alt="cover image" width="600">
</center>

<center>
<img src="./images/partecipation_per_month.png" alt="cover image" width="600">
</center>

<center>
<img src="./images/partecipation_per_union_count.png" alt="cover image" width="600">
</center>


## 🎯 **Findings**
| **Chart Type**                                 | **Key Label / Message**                                      |
|------------------------------------------------|---------------------------------------------------------------|
| **Line Chart – Strikes per Year (2016–2024)**  | 📈 *Post-COVID: +100% average strikes, lower variability*     |
| **Bar Chart – Strikes per Month**              | 🕓 *March peaks; Summer & Holidays see dips*                 |
| **Bar Chart – Strikes per Weekday (2021–2025)**| 🗓️ *Friday–Monday spikes: maximize disruption window*         |
| **Bar Chart – Strikes by Main Union**          | 🚩 *COBAS strikes twice the average*                         |
| **Bar Chart – Participation per Union**        | 🔍 *Engagement varies 15–25%; avg ~20%*                       |
| **Scatter Plot – Union Strike Strategy**       | ⚖️ *Few big vs. many small actions*                          |
| **Line Chart – Annual Participation Trend**    | 📉 *5% drop in participation over time*                      |
| **Bar Chart – Monthly Participation Trends**   | ⚠️ *April consistently below average*                        |
| **Scatter Plot – Participation vs. Unions**    | 🤝 *More unions = stronger turnout (r ≈ 0.5)*                |



# 📣 Share
To effectively communicate our strike analysis findings to Milan commuters, we'll create a series of digestible, visually consistent infographic that highlight key findings.


## 🎨 Color Palette
| Color Name       | Hex Code     | Preview                                                 |
|------------------|--------------|----------------------------------------------------------|
| Raisin Black     | `#211E21`    | <span style="background-color:#211E21; padding: 0 1em;">&nbsp;</span> |
| Prussian Blue    | `#11283A`    | <span style="background-color:#11283A; padding: 0 1em;">&nbsp;</span> |
| Auburn           | `#A81921`    | <span style="background-color:#A81921; padding: 0 1em;">&nbsp;</span> |
| Fire Engine Red  | `#D11D26`    | <span style="background-color:#D11D26; padding: 0 1em;">&nbsp;</span> |
| Imperial Red     | `#EF2233`    | <span style="background-color:#EF2233; padding: 0 1em;">&nbsp;</span> |
| Orange Pantone   | `#F46924`    | <span style="background-color:#F46924; padding: 0 1em;">&nbsp;</span> |
| Pumpkin          | `#F5752E`    | <span style="background-color:#F5752E; padding: 0 1em;">&nbsp;</span> |
| xanthous         | `#F9B113`    | <span style="background-color:#F9B113; padding: 0 1em;">&nbsp;</span> |
| Almond           | `#F8E0C8`    | <span style="background-color:#F8E0C8; padding: 0 1em;">&nbsp;</span> |
| Uranian Blue     | `#B0DCF4`    | <span style="background-color:#B0DCF4; padding: 0 1em;">&nbsp;</span> |
| White            | `#FFFFFF`    | <span style="background-color:#FFFFFF; padding: 0 1em;">&nbsp;</span> |


## 📊 Infographics
**How, When, and Why Strikes Hit Us**  
Strikes have doubled since COVID, strategically peaking on Fridays and in politically sensitive months like March. Unions differ not only in how often they strike, but in how much public backing they garner — with some prioritizing frequent minor actions, while others aim for fewer, high-impact events. Despite stable strike rates, public participation is slowly declining, unless multiple unions coordinate — then, disruption surges. Understanding these patterns helps us anticipate and respond better to future actions.

<br><br>

<center>
<img src="./images/infographic_1.png" alt="infographic 1" width="600">
</center>

<br><br>

<center>
<img src="./images/infographic_2.png" alt="infographic 2" width="600">
</center>

<br><br>

<center>
<img src="./images/infographic_3.png" alt="infographic 2" width="600">
</center>