<a href="https://colab.research.google.com/github/mori-assereto/DataAlchemy/blob/main/Data_visualisation_projects.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Objective
 Present three data visualisation projects designed for clients with unique objectives and diverse audiences.

# Requirements

In [None]:
  !pip install -U -q PyDrive

In [None]:
import pandas as pd
import numpy as np
from google.colab import files
from google.colab import drive
import matplotlib.pyplot as plt
import seaborn as sns

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials


# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# Graph 1 Women's Impact on Vacancies in Parliaments

🧑‍💻 Proyect: Sharing the Impact of Women on Vacancies in Parliaments

🗄️ Dataset

 Women in parliaments is the percentage of parliamentary seats in a single or lower house held by women.

📑 Brief Description of the Dataset

The "Women in parliaments" dataset provides information on the percentage of parliamentary seats held by women in a single or lower house in national parliaments. The data is compiled by the Inter-Parliamentary Union, based on information provided by national parliaments. It is important to note that country coverage may vary due to suspensions or dissolutions of parliaments, and some women may face obstacles in efficiently fulfilling their parliamentary mandate.

❓Research Question
The question the visualisation strategy seeks to answer is: How has women's representation in parliaments globally and regionally evolved over time, and what is its impact on the filling of vacancies?


In [None]:
link = 'https://drive.google.com/file/d/1f8DB3XOCROkHSIdIlpBXBUsyXWMp_qBa/edit'

# to get the id part of the file
id = link.split("/")[-2]

downloaded = drive.CreateFile({'id':id})
downloaded.GetContentFile('share-of-women-in-parliament.csv')

data = pd.read_csv('share-of-women-in-parliament.csv')

### Data Preparation

In [None]:
# I define regions for specific visualisations
world = ['World']

continents= [
    "Africa",
    "Europe",
    "South America",
    "Asia",
    "North America",
    "Oceania"
    ]

# # Not here
# income = [
#     "High income",
#     "Upper middle income",
#     "Middle income",
#     "Lower middle income",
#     "Low income"
#     ]

df_world = data[data['Entity'].isin(world)]
df_continents = data[data['Entity'].isin(continents)]
# df_income = data[data['Entity'].isin(income)]

In [None]:
# Example usage
df_continents['Entity'].unique()

array(['Africa', 'Asia', 'Europe', 'North America', 'Oceania',
       'South America'], dtype=object)

In [None]:
# Rename column
df_continents = df_continents.rename(columns={'wom_parl_vdem_owid': 'Share', 'Entity': 'Continents'})
df_continents

Unnamed: 0,Continents,Code,Year,Share
81,Africa,,1900,0.000000
82,Africa,,1901,0.000000
83,Africa,,1902,0.000000
84,Africa,,1903,0.000000
85,Africa,,1904,0.000000
...,...,...,...,...
13032,South America,,2018,27.663334
13033,South America,,2019,28.480833
13034,South America,,2020,28.633333
13035,South America,,2021,30.413637


## 📈 Selected Visualisation Strategy

To visualise the evolution of the percentage of female representation in parliaments, we have opted for a line graph. Each region (Europe, Asia, Oceania, North America, South America) will have its own timeline, and there will be an additional line representing the global average. We will use a distinctive colour scheme for each region and a colour palette that ensures easy identification.

In [None]:
import pandas as pd
import plotly.express as px

# List of customised colours for each continent
color_mapping = {
    'Africa': '#FF2973',
    'Europe': '#3366FF',
    'South America': '#2FD6C8',
    'Asia': '#DC7FF4',
    'North America': '#4DDE14',
    'Oceania': '#FCC8BB'
}

# Creating interactive visualisation with Plotly Express and assigning custom colours
fig = px.line(df_continents, x='Year', y='Share', color='Continents', color_discrete_map=color_mapping,
              title='Female Representation in Parliaments by Region')

# Add world average line in black
avg_share = df_continents.groupby('Year')['Share'].mean().reset_index()
fig.add_scatter(x=avg_share['Year'], y=avg_share['Share'], mode='lines', name='World Average', line=dict(color='#2F3940'))

# Add interactivity
fig.update_layout(
    xaxis_title='Year',
    yaxis_title='Percentage of Female Representation',
    hovermode='x unified'
)

# Add interactive filter by date
fig.update_xaxes(rangeslider_visible=True)

# Set initial range from 1940
fig.update_xaxes(range=[1940, df_continents['Year'].max()])

# Display the interactive graphic
fig.show()


📊 Justification of the Visualisation Strategy

The choice of a line graph aims to provide a clear and effective representation of the temporal evolution of women's representation in parliaments. This type of graph is ideal for highlighting trends and patterns over time, which are essential for telling the story of women's participation in politics.

The inclusion of regional and global timelines reflects the global nature of the UN and its focus on gender equality. These timelines allow for a direct visual comparison between geographic areas, highlighting disparities and similarities. This supports the UN's mission to promote global gender equality.

Distinctive colours for each region, with accessibility checks and highlighting the global average in black, serve a functional and aesthetic purpose. They facilitate the identification of lines and highlight the diversity that the UN represents. The clarity of the lines and the variety of colours make the visualisation informative and captivating.

In summary, the chosen line graph not only effectively communicates the evolution of women's representation in parliaments, but also aligns with the UN's global values and goals on gender equality and women's empowerment. Attention to temporal trends and regional comparison educates on the relevance of equity in political decision-making.

# Graph 2 - Evolution of the Vaccination Index.

🧑‍💻 Project: Sales strategy for DTP3 vaccine

🗄️ Dataset: Vaccination Coverage by Income

📑 Brief Description of the Dataset

The "Vaccination Coverage by Income" dataset contains information related to vaccination coverage by income in different regions and countries. The data includes entities, codes, years, total population, continents, real GDP per capita on the production side and the percentage of DTP3 immunisation in one-year-old children.

❓Research Questions

Why: To develop an effective marketing strategy for XYZ Pharmaceuticals' DTP3 vaccine, considering the variables of vaccination coverage, income levels and packaging options.
Research Question: How can we design a sales strategy that maximises the distribution and purchase of the DTP3 vaccine, taking into account the relationship between vaccination coverage, revenue and packaging presentation?

In [None]:
link = 'https://drive.google.com/file/d/1KWeWrDkIqVUWLc_wZDuhK_8APFWE0h7d/view?usp=drive_link'

# to get the id part of the file
id = link.split("/")[-2]

downloaded = drive.CreateFile({'id':id})
downloaded.GetContentFile('vaccination-coverage-by-income-in.csv')

data_vaccination = pd.read_csv('vaccination-coverage-by-income-in.csv')

In [None]:
data_vaccination.sample(5)

Unnamed: 0,Entity,Code,Year,"Total population (Gapminder, HYDE & UN)",Continent,Output-side real GDP per capita (gdppc_o) (PWT 9.1 (2019)),DTP3 (% of one-year-olds immunized)
25401,Mali,MLI,1912,3313580.0,,,
32828,Paraguay,PRY,1956,1717000.0,,1764.4285,
36047,San Marino,SMR,1891,8705.0,,,
22546,Lebanon,LBN,1847,403687.0,,,
4603,Benin,BEN,1943,2112722.0,,,


In [None]:
data_vaccination = data_vaccination.rename(columns={'Entity' : 'Country','Total population (Gapminder, HYDE & UN)' : 'Total Population'})
data_vaccination.sample(5)

Unnamed: 0,Country,Code,Year,Total Population,Continent,Output-side real GDP per capita (gdppc_o) (PWT 9.1 (2019)),DTP3 (% of one-year-olds immunized)
20257,Jamaica,JAM,1882,578303.0,,,
15691,Ghana,GHA,1891,1791800.0,,,
1242,Angola,AGO,1928,3749017.0,,,
33523,Poland,POL,1989,37833000.0,,8821.9854,96.0
19217,Iran,IRN,2012,75540000.0,,16355.355,99.0


In [None]:
import plotly.express as px

# Filter the data for the desired range of years (1980-2020)
filtered_dataset = data_vaccination.query('Year >= 1980 and Year <= 2020')

# Sort the dataset by the column "Year".
filtered_dataset = filtered_dataset.sort_values('Year')

# Create the interactive choropleth map
fig = px.choropleth(filtered_dataset,
                    locations='Code',
                    color='DTP3 (% of one-year-olds immunized)',
                    hover_name='Country',
                    projection='natural earth',
                    animation_frame='Year',
                    title='Índice de Vacunación (DTP3) por País entre 1970 y 2020')

# Show the interactive map
fig.show()

In [None]:
link = 'https://drive.google.com/file/d/1x7Ze5c3uOc6tYmnpCVYd_M3GduZGAvUc/view?usp=drive_link'

# to get the id part of the file
id = link.split("/")[-2]

downloaded = drive.CreateFile({'id':id})
downloaded.GetContentFile('continents2.csv')

continents = pd.read_csv('continents2.csv')

In [None]:
continents = continents.rename(columns={'alpha-3':'Code', 'region':'Continent', 'name':'Country'})
columnas=['alpha-2', 'country-code', 'iso_3166-2', 'sub-region', 'intermediate-region', 'region-code', 'sub-region-code', 'intermediate-region-code',]
continents = continents.drop(columnas, axis=1)
continents.head(5)

Unnamed: 0,Country,Code,Continent
0,Afghanistan,AFG,Asia
1,Åland Islands,ALA,Europe
2,Albania,ALB,Europe
3,Algeria,DZA,Africa
4,American Samoa,ASM,Oceania


In [None]:
# I perform a merge to add the continent information
merged_dataset = data_vaccination.merge(continents[['Code', 'Continent']], on='Code', how='left', suffixes=('', '_new'))

# I replace the NaN values in the 'Continent' column with the values from the new dataset.
merged_dataset['Continent'] = merged_dataset['Continent_new'].fillna(merged_dataset['Continent'])

# I remove the auxiliary columns
merged_dataset.drop(['Continent_new'], axis=1, inplace=True)
merged_dataset.head(5)

Unnamed: 0,Country,Code,Year,Total Population,Continent,Output-side real GDP per capita (gdppc_o) (PWT 9.1 (2019)),DTP3 (% of one-year-olds immunized)
0,Abkhazia,OWID_ABK,2015,,Asia,,
1,Afghanistan,AFG,1800,3280000.0,Asia,,
2,Afghanistan,AFG,1801,3280000.0,Asia,,
3,Afghanistan,AFG,1802,3280000.0,Asia,,
4,Afghanistan,AFG,1803,3280000.0,Asia,,


## 📈Selected Visualisation Strategy

To analyse the DTP3 vaccine sales strategy, we have chosen a line chart grouped by region. Each bar will represent a region, and we will use different colours to highlight each region.

In [None]:
import plotly.express as px

# Filter the data for the desired range of years (1970-2020) and exclude "Antarctica".
filtered_dataset = merged_dataset.query('Year >= 1980 and Year <= 2020 and Continent != "Antarctica" and Continent != "North America"')

# Group the data by continent and year, calculate the median vaccination rate
continent_data = filtered_dataset.groupby(['Continent', 'Year'])['DTP3 (% of one-year-olds immunized)'].median().reset_index()

# Create a custom colour dictionary
color_map = {'America': 'red', 'Antarctica': 'grey', 'Oceania':'green'}

# Create a line chart grouped by continent and year with the median.
fig = px.line(continent_data,
              x='Year',
              y='DTP3 (% of one-year-olds immunized)',
              color='Continent',
              title='Índice de Vacunación (DTP3) por Continente entre 1970 y 2020',
              color_discrete_map=color_map)  # Apply custom colours

# Add interactivity
fig.update_layout(
    xaxis_title='Año',
    yaxis_title='Porcentaje de niños menores de 1 año vacunados con la GDP3',
)


# Show the graph
fig.show()


📊 Visualisation Strategy Justification

Considering the nature of this project focusing on the DTP3 vaccine sales strategy, we have opted for a visualisation that allows a detailed understanding of the relationship between vaccination coverage, income and regions. The bar chart grouped by region is particularly suitable for this purpose.

The choice of a line graph is based on its ability to represent and compare data across different categories, in this case, regions. This type of chart will allow XYZ Pharmaceuticals analysts to effectively identify differences and similarities in DTP3 vaccine sales strategy in various parts of the world. Distinctive colours for each region will facilitate visual interpretation and data comparison.

In addition, as sales strategy can be influenced by economic factors such as GDP per capita, the line graph will allow for the identification of any relationships or trends between these factors and sales strategy. Grouping by region will also allow exploration of how economic factors may affect sales decisions in different parts of the world.

# Graph 3 - Positive Ecological Impact of Vegan Products.

🧑‍💻 Project: Demonstrating the Positive Ecological Impact of Vegan Products

🗄️ Dataset: Freshwater Withdrawals per Kilogram (Poore & Nemecek, 2018).

📑Brief Description of the Dataset.

The Freshwater Withdrawals per Kilogram dataset provides information on the freshwater withdrawal required to produce one kilogram of a food product. This metric is an important measure for assessing the environmental impact of food production. The data is based on a comprehensive meta-analysis of food system impact studies conducted by Poore & Nemecek in 2018.

❓Research questions
What is the purpose? To demonstrate visually and convincingly how vegan products reduce water consumption and their environmental impact compared to non-vegan products.
How does the freshwater withdrawal needed to produce vegan food compare to non-vegan food, and how is this positive ecological impact reflected in the data?

## 📈Selected Visualisation Strategy

To visualise the difference in freshwater withdrawal between vegan and non-vegan foods, we have opted for a horizontal bar chart. In this graph, we will use emojis to more clearly and visually represent the amount of water withdrawn per kilogram of product. Vegan and non-vegan foods will be grouped together, and both an overall average and category-specific examples will be included.

In [1]:
import pandas as pd
import plotly.express as px

# Create the DataFrame with your data, including custom colours
data = {
    'Food': ['Dairy Cattle', 'Cheese', 'Fish', 'Milk', 'Apple', 'Chocolate', 'Oats', 'Vegetable Milk', 'Tofu', 'Wheat & Rye', 'Vegan', 'Meat & Dairy'],
    'Category': ['Meat & Dairy', 'Meat & Dairy', 'Meat & Dairy', 'Meat & Dairy', 'Meat & Dairy', 'Vegan', 'Vegan', 'Vegan', 'Vegan', 'Vegan', 'Vegan', 'Meat & Dairy'],
    'Freshwater withdrawals per kilogramme': [2725.30, 5605.20, 3691.30, 628.20, 180.10, 540.60, 482.40, 27.80, 148.60, 647.50, 215.70, 1799.30],
    'Color': ['#F5A038', '#F5A038', '#F5A038', '#F5A038', '#8D7AE6', '#8D7AE6', '#8D7AE6', '#8D7AE6', '#8D7AE6', '#8D7AE6', '#5E4AB2', '#DB5B0B'],
    'Emoji': ['🐄', '🧀', '🐟', '🥛', '🍎', '🍫', '🥣', '🌰', '🫘', '🌾', '🌱', '🍖']
}
df_vegan = pd.DataFrame(data)

# Create the interactive visualisation with emojis and customised colours
fig = px.bar(df_vegan, y='Food', x='Freshwater withdrawals per kilogramme', text='Emoji',
             title='Freshwater Withdrawals by Food Category Vegan and Meat & Dairy', orientation='h')

# Assign custom colours using the 'Colour' column
fig.update_traces(marker_color=df_vegan['Color'])

# Add interactivity and sort by category and alphabetically
fig.update_layout(
    xaxis_title='Freshwater Extraction (litres per kg)',
    yaxis_title='Food',
    hovermode='y',
    yaxis=dict(categoryorder='total ascending'),  # Ordenar por categoría y alfabéticamente
)


# Display graphic
fig.show()


📊Justification of the Visualisation Strategy.

The choice of the horizontal bar graph with emojis is based on the need to communicate the positive environmental impact of GreenEats vegan products effectively. This strategy seeks to highlight the disparities in freshwater withdrawal per kilogram of product in a way that is accessible and appealing to GreenEats' target audience.

The emojis in the graphic add a friendly and engaging visual element, capturing the audience's attention. Each emoji denotes a food item, simplifying the interpretation of differences between categories. This visual representation makes the data tangible and memorable, reinforcing GreenEats' key message.

The horizontal bar approach facilitates comparison between the amount of water used in vegan and non-vegan products. The horizontal bars highlight the differences in water consumption and provide a natural framework for comparing values. In addition, the size of the bars proportionally reflects the amount of water used, emphasising the benefits of vegan products.

The colours of the bars also serve a specific purpose. Differentiated colours for vegan and non-vegan products not only provide a pleasing visual appearance, but also allow instant identification of categories. The chosen colours can reflect the GreenEats philosophy and brand, establishing visual consistency with its focus on sustainability.

In short, the visualisation strategy employing emojis, horizontal bars and distinctive colours has been designed to effectively convey the message of the positive environmental impact of GreenEats vegan products. This strategy seeks not only to inform, but also to generate a visual and emotional impact on the audience, fostering a deeper understanding of the influence of vegan products on freshwater conservation.

# Learnings
We confirmed the importance of adapting our visualisation strategies to the needs and characteristics of the audience. In the case of the UN, the choice of a line graph allowed us to compare trends over time and across regions, highlighting global diversity. For GreenEats, a horizontal bar chart with emojis was the ideal approach to compare data between vegan and non-vegan foods in an impactful and accessible way. And for XYZ Pharmaceuticals, we demonstrated that a coloured line strategy helps address the sales issue.

We learned that the rationale for our choices is critical for clients to understand the focus and relevance of the proposed visualisations. Aligning the visual elements with the values and objectives of the clients, such as the global representation of the UN or the GreenEats philosophy, reinforces the impact of the visualisations.

Finally, we recognise the importance of flexibility and the willingness to adjust our strategies according to emerging needs. The inclusion of dynamic visualisations and consideration of multiple metrics allows for deeper exploration of data and meets the changing needs of clients.