<a href="https://colab.research.google.com/github/mkaanmolla/Measles_DataSet_Visualisation/blob/main/measles_datavis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ***Measles***

# Introduction
The purpose of examining the Measles dataset is to compare the vaccination rates of schools in certain states of the United States, to ensure that parents who will enroll their children in these schools have information about the relevant issue, and to infer a meaningful relationship between the vaccination rates and the states.

# Veri Seti
The dataset is named "Measles". Taken from the TidyTuesday GitHub page. Its content is as follows: <br>


*   state= School State
*   year= Academic year of school data
*   name= School Name
*   type= Type of school(private school etc.)
*   city= City where the school is located
*   county= County where the school is located
*   district= District where the school is located
*   enroll= How many people enrolled in the school that year
*   mmr= School measles, mumps, rubella vaccination rate
*   overall= School's overall vaccination rate
*   xrel=Percentage of those who could not be vaccinated for religious reasons
*   xmed=Percentage of those who could not be vaccinated for medical reasons
*   xper=Percentage of those who could not be vaccinated for personal reasons



**Question-1)** What is the distribution of MMR data in the States of America (included in the Data Set)?
<br> **Results:** <br> 
First of all, in order to obtain the desired data, the unnecessary columns in the data set were cleaned, and the data with the MMR ratio of "-1" were dropped. Looking at the states in general, the relevant vaccines (measles, mumps, rubella) are present at a rate of 80%. This rate is quite high in kindergartens, especially since these vaccines should be given at a young age.
<br>**Data Pointer**: Scatter Map
<br>**Visual channels used:**
<br> Color: MMR Percentage
<br>Hover Data: School, Type, State, MMR 
<br>**Following Topic**: It should be examined whether the high rate of vaccination with the relevant vaccines affects the overall vaccination rate.

In [None]:
#@title
import chart_studio.plotly as py
import plotly.graph_objs as go
import plotly
import pandas as pd
import plotly.graph_objects as go

measles = pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-25/measles.csv", low_memory=False)



measles1 = measles.drop(measles[measles.mmr < 0].index)

mmr_oranı=measles1.drop(columns=['district'])

mmr_oranı['text'] = 'SCHOOL: '+measles1['name'] + '\n TYPE:' + measles1['type'].astype(str) + '\n STATE: ' + measles1['state'] + '\n' + 'MMR: ' + measles1['mmr'].astype(str)

okul_tipi=measles.dropna(axis=0,subset=["type"],inplace=False)
scl = [0,"rgb(150,0,90)"],[0.125,"rgb(0, 0, 200)"],[0.25,"rgb(0, 25, 255)"],\
[0.375,"rgb(0, 152, 255)"],[0.5,"rgb(44, 255, 150)"],[0.625,"rgb(151, 255, 0)"],\
[0.75,"rgb(255, 234, 0)"],[0.875,"rgb(255, 111, 0)"],[1,"rgb(255, 0, 0)"]
fig = go.Figure(data=go.Scattergeo(
        lon = mmr_oranı['lng'],
        lat = mmr_oranı['lat'],
        text = mmr_oranı['text'],
        marker = dict(
        color = mmr_oranı['mmr'],
        colorscale = scl,
        reversescale = True,
        opacity = 0.7,
        size = 2,
        colorbar = dict(
            title="MMR Percentage",
            titleside = "right",
            nticks=10,
            ticklen=1, 
            tickwidth=1,
            showticklabels=True,
            tickangle=0, 
            tickfont_size=10,
            outlinecolor = "rgba(68, 68, 68, 0)",
            ticks = "inside",
            showticksuffix = "last",
            dtick = 10
        )
        )
))

fig.update_layout(
    geo = dict(
        scope = 'north america',
        showland = True,
        landcolor = "rgb(212, 212, 212)",
        subunitcolor = "rgb(255, 255, 255)",
        countrycolor = "rgb(255, 255, 255)",
        showlakes = True,
        lakecolor = "rgb(255, 255, 255)",
        showsubunits = True,
        showcountries = True,
        resolution = 50,
        projection = dict(
            type = 'conic conformal',
            rotation_lon = -100
        ),
        lonaxis = dict(
            showgrid = True,
            gridwidth = 0.5,
            range= [ -140.0, -55.0 ],
            dtick = 5
        ),
        lataxis = dict (
            showgrid = True,
            gridwidth = 0.5,
            range= [ 20.0, 60.0 ],
            dtick = 5
        )
    ),
    title='MRR',
)

fig.show()

# data = []
# for event in event_types:
#     event_data = dict(
#             lat = df.loc[df['EVENT_TYPE'] == event,'BEGIN_LAT'],
#             lon = df.loc[df['EVENT_TYPE'] == event,'BEGIN_LON'],
#             name = event,
#             marker = dict(size = 8, opacity = 0.5),
#             type = 'scattermapbox'
#         )
#     data.append(event_data)


**Question-2)** To what extent have the MMR vaccine rates affected the overall vaccine rates?
<br> **Results:** <br> 
In order to obtain the desired data, the unnecessary columns in the data set were cleaned, and the data with an overall ratio of "-1" were eliminated. Looking at the map, the data on the general vaccination rate is not very abundant in the dataset. When MMR rates are compared with the general vaccine rate, the importance given to MMR vaccines in kindergartens is higher. In other schools, MMR seems to affect the overall vaccination rate positively. However, this does not apply to all schools.
<br>**Data Pointer**: Scatter Map
<br>**Visual channels used:**
<br> Color: General Vaccination Percentage
<br>Hover Data: School, Type, State, Overall 
<br>**Following Topic**: The reasons for reducing the vaccination rates should be examined.
 


In [None]:
#@title

measles2 = measles.drop(measles[measles.overall < 0].index)

overall_oranı=measles2.drop(columns=['district'])

overall_oranı['text'] = 'SCHOOL: '+measles2['name'] + '\n TYPE:' + measles2['type'].astype(str) + '\n STATE: ' + measles2['state'] + '\n' + 'OVERALL: ' + measles2['overall'].astype(str)

scl = [0,"rgb(150,0,90)"],[0.125,"rgb(0, 0, 200)"],[0.25,"rgb(0, 25, 255)"],\
[0.375,"rgb(0, 152, 255)"],[0.5,"rgb(44, 255, 150)"],[0.625,"rgb(151, 255, 0)"],\
[0.75,"rgb(255, 234, 0)"],[0.875,"rgb(255, 111, 0)"],[1,"rgb(255, 0, 0)"]
fig = go.Figure(data=go.Scattergeo(
        lon = overall_oranı['lng'],
        lat = overall_oranı['lat'],
        text = overall_oranı['text'],
        marker = dict(
        color = overall_oranı['overall'],
        colorscale = scl,
        reversescale = True,
        opacity = 0.7,
        size = 2,
        colorbar = dict(
            title="Overall Percentage",
            titleside = "right",
            nticks=10,
            ticklen=1, 
            tickwidth=1,
            showticklabels=True,
            tickangle=0, 
            tickfont_size=10,
            outlinecolor = "rgba(68, 68, 68, 0)",
            ticks = "inside",
            showticksuffix = "last",
            dtick = 10
        )
        )
))

fig.update_layout(
    geo = dict(
        scope = 'north america',
        showland = True,
        landcolor = "rgb(212, 212, 212)",
        subunitcolor = "rgb(255, 255, 255)",
        countrycolor = "rgb(255, 255, 255)",
        showlakes = True,
        lakecolor = "rgb(255, 255, 255)",
        showsubunits = True,
        showcountries = True,
        resolution = 50,
        projection = dict(
            type = 'conic conformal',
            rotation_lon = -100
        ),
        lonaxis = dict(
            showgrid = True,
            gridwidth = 0.5,
            range= [ -140.0, -55.0 ],
            dtick = 5
        ),
        lataxis = dict (
            showgrid = True,
            gridwidth = 0.5,
            range= [ 20.0, 60.0 ],
            dtick = 5
        )
    ),
    title='Overall',
)

fig.show()

**Question-3)** What are the total rates and reasons for not getting vaccinated on a state basis?
<br> **Results:** <br> 
In order to obtain the desired data, the states were grouped and related reasons (xrel, xmed, xper) were collected. The abbreviations of the states were added to the data set and output in the form of bar graphs. Due to the data set, not much information could be obtained about people who could not be vaccinated for religious reasons. However, information was obtained about the states that could not be vaccinated due to medical and personal reasons. Especially in California, it was seen that many schools could not vaccinate due to health reasons. For personal reasons, this number is quite high in the state of Wisconsin.
<br>**Data Pointer**: Bar Chart
<br>**Visual channels used:**
<br> Color: Blue-Religious Reasons, Orange-Medical Reasons, Green-Personal Reasons
<br>x axis= States, y axis= Total Vaccination Rate
<br>**Following Topic**: Medical and personal reasons should be examined in detail, since there is not much data on those who could not be vaccinated for religious reasons.

In [None]:
#@title
import plotly.express as px
dataFrame = measles[["state","xrel","xmed","xper"]]
df1 = dataFrame.groupby(['state']).agg({ 'xrel' : 'sum', 'xmed' : 'sum', 'xper' : 'sum' })
address = ['AZ','AR','CA','CO','CT','FL','ID','IL','IA','ME','MA','MI','MN','MO','MT','NJ','NY','NC','ND','OH','OK','OR','PA','RI','SD','TN','TX','UT','VT','VA','WA','WI']
df1['Adress']=address

#df1['text'] = 'STATE: ' + df1['Adress'] + '\n' + 'Medical: ' + df1['xmed'].astype(str)

modified = df1.reset_index(level='state')



df3 = dataFrame.groupby(['state']).agg({ 'xrel' : 'sum', 'xmed' : 'sum', 'xper' : 'sum'})
df3.plot.bar()
#fig = px.choropleth(locations=df1['Adress'], locationmode="USA-states", color=df1['xmed'], scope="usa", hover_data=[df1['text'].astype(str)])


**Question-4)** In order to examine the data in more detail, how is the rate of those who could not be vaccinated for health and personal reasons displayed on the map of America?
<br> **Results:** <br> 
When medical reasons were examined, it was seen that those who could not get vaccinated in the states of California, Ohio and Michigan could be defined as outliers, and it was understood that there could be health problems in these states.
When personal reasons were examined, it was seen that vaccination could not be done at a high rate in Wisconsin and Washington states. When the income levels of the states are examined, it is seen that the income levels of these two states are close.
<br>**Data Pointer**: Choropleth Map
<br>**Visual channels used:**
<br> Color: Total proportion of those who could not be vaccinated for Medical and Personal reasons.
<br>**Following Topic**: 
School types should be examined in order to obtain information about the income levels of the non-vaccinated population.

In [None]:
#@title
fig = px.choropleth(modified,locations='Adress', locationmode="USA-states", color='xmed', title="Medical Reasons", scope="usa",hover_data=['state'])

fig.update_layout(
    title={
        'text': "Medical Reasons",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})


fig.show()

fig = px.choropleth(modified,locations='Adress', locationmode="USA-states", color='xper',title="Personal Reasons", scope="usa",hover_data=['state'])

fig.update_layout(
    title={
        'text': "Personal Reasons",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})


fig.show()


**Question-5)** Does the rate of not being vaccinated and/or not getting the vaccine have anything to do with the income level? Can schools provide information on income levels?
<br> **Results:** <br> 
In order to obtain the desired data, the bubble map method was used and the types of schools were examined to the extent allowed by the data set by clicking on the legend. As can be seen, the data obtained about kindergartens are consistent. However, in the states of Oihao (outlier of personal reasons) and California, it is seen that private schools are quite dense. When this concentration is considered, it is understood that the income level and the rate of not getting vaccinated are directly proportional.
<br>**Data Pointer**: Bubble Map
<br>**Visual channels used:**
<br> Color: School types (Can be changed by clicking on the legend.)

In [None]:
#@title

df_type=measles.loc[:, ('state', 'type','lat','lng')]
df_type.dropna(axis=0,subset=["type"],inplace=True)
df_type['text'] = df_type['state'] +', '+df_type['type'].astype(str)


limits = [('Public'),('Charter'),('Kindergarten'),('Nonpublic'),('Private')]
colors = ["royalblue","crimson","lightseagreen","orange","lightgrey"]



fig = go.Figure()

for i in range(len(limits)):
    
    df_sub = df_type.drop(df_type[df_type['type'] != limits[i]].index, inplace = False)
    fig.add_trace(go.Scattergeo(
        locationmode = 'USA-states',
        lon = df_sub['lng'],
        lat = df_sub['lat'],
        text = df_sub['text'],
        marker = dict(
            size=len(limits[i])*1.5,
            opacity=0.75,            
            color = colors[i],
            line_color='rgb(40,40,40)',
            line_width=0.5,
            sizemode = 'area'
        ),
        name = limits[i]))

fig.update_layout(
        title_text = 'School Types<br>(Click legend to toggle traces)',
        showlegend = True,
        geo = dict(
            scope = 'usa',
            landcolor = 'rgb(217, 217, 217)',
        )
    )

fig.show()

#Result
For the Measles set, it can be said that it primarily gives more information about the general vaccination rate of schools in the US states than measles. In addition, when each graphic is considered, it is possible to say that kindergartens are more sensitive about vaccination for diseases such as measles, mumps and rubella, which should be vaccinated at an early age. It is seen that personal and health reasons are at the forefront in schools where vaccination is done less. When the relationship of these reasons with income levels is examined, while the vaccination rate is expected to increase when the income level increases, this result is reflected in the opposite way and it is seen that in states such as California and Ohio, although parents with high income levels enroll their children in private schools, health problems may arise in these states or vaccinations related to personal reasons cannot be made in these states.
<br> <br>Thank you,
<br> Mahmut Kaan Molla

