# Tobacco Use Trends from 1994 to 2010

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
import seaborn as sns
warnings.filterwarnings("ignore")

**Exploring our data**

In [None]:
tb = pd.read_csv('../input/tobacco.csv')
print(tb.head())

In [None]:
tb.info()

**For ease of analysis, let us first remove % signs from our data. This is a fairly straight forward dataset, so I think that is all the data cleaning we'll need to do.**

In [None]:
columns = ['Smoke everyday', 'Smoke some days', 'Former smoker', 'Never smoked']

for x in columns:
    tb[x] = tb[x].str.strip('%').astype('float')
    

In [None]:
tb.head()

# Analysis

In [None]:
tb_group = tb.groupby(['Year'], as_index = False).mean()

fig = plt.figure(figsize = (8,6))
ax1 = fig.add_subplot(2,2,1)
ax2 = fig.add_subplot(2,2,2)
ax3 = fig.add_subplot(2,2,3)
ax4 = fig.add_subplot(2,2,4)

tb_group.head()

y = 'Percentage of people'
x = 'Year'

ax1.set(title = 'Smoke everyday', ylabel = y, xlabel = x)
ax2.set(title = 'Smoke some days', ylabel = y, xlabel = x)
ax3.set(title = 'Former smoker', ylabel = y, xlabel = x)
ax4.set(title = 'Never smoked', ylabel = y, xlabel = x)
ax1.scatter(tb_group.Year, tb_group['Smoke everyday'], )
ax2.scatter(tb_group.Year, tb_group['Smoke some days'])
ax3.scatter(tb_group.Year, tb_group['Former smoker'])
ax4.scatter(tb_group.Year, tb_group['Never smoked'])

fig.tight_layout()
fig.autofmt_xdate()
plt.show()

**As we can see above, the percentage of people who smoke everyday has gone down, overall. There are more people in 2010, who claim to never have smoked; up quite significantly since 1994.**

In [None]:
from scipy import stats

states = set(tb.State)

slope_dict = {}

for state in states:
    slope, intercept, r_value, p_value, std_err = stats.linregress(tb.Year[tb.State == state], tb['Never smoked'][tb.State == state])
    slope_dict[state] = slope
    
slope_df = pd.DataFrame([slope_dict]).transpose()
slope_df.columns = ['slope']
slope_df.sort(columns = 'slope', ascending = True, inplace = True)

In [None]:
slope_dict1 = {}

for state in states:
    slope, intercept, r_value, p_value, std_err = stats.linregress(tb.Year[tb.State == state], tb['Smoke everyday'][tb.State == state])
    slope_dict1[state] = slope
    
slope_df1 = pd.DataFrame([slope_dict1]).transpose()
slope_df1.columns = ['slope']
slope_df1.sort(columns = 'slope',ascending = False, inplace = True)

**Below I have made bar graphs with the percentage changes from 1994 to 2010 by state for 'Never Smoked', and 'Smoke everyday'.**

In [None]:
slope_df.plot(kind = 'bar', figsize = (10,6), title = 'Never Smoked: % Changes from 1994 to 2010')
slope_df1.plot(kind = 'bar', figsize = (10,6), title = 'Smoke everyday: % Changes from 1994 to 2010')
plt.show()

## Final Thoughts

**From the above graphs, we can see that although, overall, people seemed less likely to smoke, some states, were more likely to smoke, or did not change much since 1994.**

**For example, in Washington state, California, and Nevada, people were alot more likely to have never smoked in 2010, than they were in 1994, while in Washington DC, Minnesota, and especially Oklahoma, people were alote more likely to have smoked in 2010, than in 1994.**