# Lab: Axes manipulation {#sec-axes-manipulation}

One way to create potentially misleading visualisations is by manipulating the axes of a plot. Here we illustrate these using one of the FiveThirtyEight data sets, which are available [here](https://data.fivethirtyeight.com).

## Data wrangling

We are going to use polls from the recent USA presidential election. As before, we load and examine the data.

In [None]:
import pandas as pd 
import seaborn as sns
import altair as alt 

df_polls = pd.read_csv('data/presidential_poll_averages_2020.csv')
df_polls.head()

For our analysis, we are going to pick estimates from 11/3/2020 for the swing states of Florida, Texas, Arizona, Michigan, Minnesota and Pennsylvania.

In [None]:
df_nov = df_polls[
    (df_polls.modeldate == '11/3/2020')
]

df_nov = df_nov[
    (df_nov.candidate_name == 'Joseph R. Biden Jr.') |
    (df_nov.candidate_name == 'Donald Trump')
]

df_swing = df_nov[
    (df_nov['state'] == 'Florida') |
    (df_nov['state'] == 'Texas' ) |
    (df_nov['state'] == 'Arizona' ) |
    (df_nov['state'] == 'Michigan' ) |
    (df_nov['state'] == 'Minnesota' ) |
    (df_nov['state'] == 'Pennsylvania' ) 
]

df_swing

## Default barplot

We can look at the relative performance of the candidates within each state using a nested bar plot.

In [None]:
ax = sns.barplot(
    data = df_swing, 
    x = 'state', 
    y = 'pct_estimate', 
    hue = 'candidate_name')

## Altering the axes

Altering the axis increases the distance between the bars. Some might say that is misleading.

In [None]:
ax = sns.barplot(
    data = df_swing, 
    x = 'state', 
    y = 'pct_estimate', 
    hue = 'candidate_name')

ax.set(ylim=(41, 52))

What do you think?

How about if we instead put the data on the full 0 to 100 scale?

In [None]:
ax = sns.barplot(
    data = df_swing, 
    x = 'state', 
    y = 'pct_estimate', 
    hue = 'candidate_name')

ax.set(ylim=(0, 100))

We can do the same thing in Altair.

In [None]:
alt.Chart(df_swing).mark_bar().encode(
    x='candidate_name',
    y='pct_estimate',
    color='candidate_name',
    column = alt.Column('state:O', spacing = 5, header = alt.Header(labelOrient = "bottom")),
)

Note the need for the alt column. What happens if you do not provide an alt column?

Passing the domain option to the scale of the Y axis allows us to choose the y axis range.

In [None]:
alt.Chart(df_swing).mark_bar().encode(
    x='candidate_name',
    y=alt.Y('pct_estimate', scale=alt.Scale(domain=[42,53])),
    color='candidate_name',
    column = alt.Column('state:O', spacing = 5, header = alt.Header(labelOrient = "bottom")),
)

## Altering the proportions

We can even be a bit tricky and stretch out the difference.

In [None]:
alt.Chart(df_swing).mark_bar().encode(
    x='candidate_name',
    y=alt.Y('pct_estimate', scale=alt.Scale(domain=[42,53])),
    color='candidate_name',
    column = alt.Column('state:O', spacing = 5, header = alt.Header(labelOrient = "bottom")),
).properties(
    width=20,
    height=600
)

## Default line plot

It is not just bar plot that you can have fun with. Line plots are another interesting example.

For our simple line plot, we will need the poll data for a single state.

In [None]:
df_texas = df_polls[
    df_polls['state'] == 'Texas'
]

df_texas_bt = df_texas[
    (df_texas['candidate_name'] == 'Donald Trump') |
    (df_texas['candidate_name'] == 'Joseph R. Biden Jr.')
]

df_texas_bt.head()

The modeldate column is a string (object) and not date time. So we need to change that: we will create a new datetime column called `modeldate`.

In [None]:
#df_texas_bt.loc[df_texas_bt[]]

In [None]:
print('Before\n')
print(df_texas_bt.dtypes)
df_texas_bt['date'] = pd.to_datetime(df_texas_bt.loc[:,'modeldate'], format='%m/%d/%Y').copy()
print('\nAfter\n')
print(df_texas_bt.dtypes)

Create our line plot.

In [None]:
alt.Chart(df_texas_bt).mark_line().encode(
    y=alt.Y('pct_estimate', scale=alt.Scale(domain=[42,53])),
    x='date',
    color='candidate_name')

Sometimes multiple axis are used for each line, or in a combined line and bar plot.

The example [here](https://altair-viz.github.io/user_guide/scale_resolve.html) uses a dataframe with a column for each line. Our data does not have that.

In [None]:
df_texas_bt
our_df = df_texas_bt[['candidate_name', 'pct_estimate', 'date']]
our_df

Pivot table allows us to reshape our dataframe.

In [None]:
our_df = pd.pivot_table(our_df, index=['date'], columns = 'candidate_name')
our_df.columns = our_df.columns.to_series().str.join('_')
our_df.head()

Date here is the dataframe index. We want it to be a column.

In [None]:
our_df['date1'] = our_df.index
our_df.columns = ['Trump', 'Biden', 'date1']
our_df.head()

Creating our new plot, to fool all those people who expect Trump to win in Texas.

In [None]:
base = alt.Chart(our_df).encode(
        alt.X('date1')
)

line_A = base.mark_line(color='#5276A7').encode(
    alt.Y('Trump', axis=alt.Axis(titleColor='#5276A7'), scale=alt.Scale(domain=[42,53]))
)

line_B = base.mark_line(color='#F18727').encode(
    alt.Y('Biden', axis=alt.Axis(titleColor='#F18727'), scale=alt.Scale(domain=[35,53]))
)

alt.layer(line_A, line_B).resolve_scale(y='independent')

Did you see what I did there?

Of course, mixed axis plots are rarely purely line plots. Instead they can be mixes of different axis. For these and other plotting mistakes, the economist has a nice article [here](https://medium.economist.com/mistakes-weve-drawn-a-few-8cdd8a42d368). You may want to try some of these plots with this data set or the world indicators dataset from a few weeks ago.