Thanks to this dataset, we can now analyse how the tobacco usage among USA looks like. Let's import the data into a pandas dataframe. Only the first 6 columns of the data are enough for the analysis.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

import pandas as pd
tobaccoUseData = pd.read_csv('../input/tobacco-use/tobacco.csv', usecols=[0, 1, 2, 3, 4, 5])

As a part of data preparation, the following steps are performed.
1. We will now rename the columns for the sake of readability. This is an optional step. I am just experimenting with pandas dataframe. 
2. Removing the "%" symbol from the entire data and converting them to a numeric value.
2. We will also check for null values in the dataset.

In [None]:
tobaccoUseData = tobaccoUseData.rename(columns={'Year':'year',
                                            'State':'state',
                                            'Smoke everyday':'dailySmoker', 
                                            'Smoke some days':'nonDailySmoker', 
                                            'Former smoker':'formerSmoker', 
                                            'Never smoked':'nonSmoker'})
for col in tobaccoUseData.columns[2:]:
    tobaccoUseData[col] = tobaccoUseData[col].str.rstrip('%')
    tobaccoUseData[col] = pd.to_numeric(tobaccoUseData[col])
tobaccoUseData.info()

We have 4 classes in the dataset namely: dailysmoker, nondailysmoker, formersmoker and non smoker. First step is to visualise these classes change yearwise in a line plot.

In [None]:
plt.figure(figsize=(9, 4))
ax = plt.subplot(111)
ax.spines["top"].set_visible(False)    
ax.spines["bottom"].set_visible(False)    
ax.spines["right"].set_visible(False)    
ax.spines["left"].set_visible(False)
ax.get_xaxis().tick_bottom()    
ax.get_yaxis().tick_left()
plt.ylim(0, 90)    
plt.xlim(1995, 2010)

plt.title("Change in smokers [in %]", fontsize=16)
plt.plot(tobaccoUseData.sort_values(['year']).groupby(['year']).mean()['dailySmoker'])
plt.plot(tobaccoUseData.sort_values(['year']).groupby(['year']).mean()['nonDailySmoker'])
plt.plot(tobaccoUseData.sort_values(['year']).groupby(['year']).mean()['formerSmoker'])
plt.plot(tobaccoUseData.sort_values(['year']).groupby(['year']).mean()['nonSmoker'])
legend = ax.legend(loc='upper left')

Personally, I am very happy to notice the following:
1. More number of non smokers, which is increasing every year.
2. Diminishing percentage in the daily smokers.

The data is available at state level. We can visualise state wise percentage change in tobacco usage for different classes of smokers between 1995 and 2010. We use seaborn package to visualise in the form of a heatmap.

In [None]:
plt.figure(figsize=(6, 15))
ax = plt.subplot(111)
ax.spines["top"].set_visible(False)    
ax.spines["bottom"].set_visible(False)    
ax.spines["right"].set_visible(False)    
ax.spines["left"].set_visible(False)
ax.get_xaxis().tick_bottom()    
ax.get_yaxis().tick_left()

plt.title("State wise percentage change in smokers", fontsize=16)
tobaccoUse1995 = tobaccoUseData.sort_values(['state']).loc[tobaccoUseData['year'] == 2010].groupby(['state']).mean()
tobaccoUse2010 = tobaccoUseData.sort_values(['state']).loc[tobaccoUseData['year'] == 1995].groupby(['state']).mean()
tobaccoUseChangeDF = tobaccoUse1995 - tobaccoUse2010
tobaccoUseChangeDF = tobaccoUseChangeDF[pd.notnull(tobaccoUseChangeDF['dailySmoker'])]

import seaborn as sns
ax = sns.heatmap(tobaccoUseChangeDF.drop(['year'],axis=1))

I would like to see the correlation between the current data and also the information from https://en.wikipedia.org/wiki/List_of_smoking_bans_in_the_United_States.
I have created a simple csv file consolidating the data in the above wiki page. This page lists the types of bans imposed by the sates on tobacco usage at public places.

In [None]:
import pandas as pd
tobaccoBanDetails = pd.read_csv('../input/tobacco-ban-details-in-usa-states/TobaccoBanUSAstates.csv')
banGroups = tobaccoBanDetails.groupby(tobaccoBanDetails.banDetails)
#banInPublic
#banInRestu
#banInRestuBar
#banInRestuWork
#banInWork
#noBan
fig = plt.figure()
ax = fig.add_subplot(111)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
plt.ylim(0, 70)
#df2.plot(kind='bar', color='mediumturquoise', ax=ax, position=1, width=0.25)
tobaccoUseData[''] = 0
colors = ['lightcoral', 'lightseagreen', 'sandybrown', 'mediumseagreen', 'salmon', 'mediumturquoise']
i = -2
for gr in banGroups.groups:
    df = tobaccoUseData[tobaccoUseData['state'].isin(
                        banGroups.get_group(gr)['state'])].drop(['year'],axis=1).groupby(['state']).mean().mean()
    df.plot(kind='bar', color=colors[i], ax=ax, position=i, width=0.1)
    plt.text(-0.5, 60-((i+2)*4), gr, color=colors[i], fontweight='bold', fontsize=14)
    i+=1

plt.title("Percentage of smokers arranged by ban imposed by State", fontsize=16)


The bar plot is pretty self explanatory. But let me highlight some of my observations:
1. The daily smokers percentage is low in the states where there is a complete ban on tabacco usage in public.
2. There is no noticable affect of the types of ban on non daily smokers.
3. Ban on restuarant has a high affect in minimising daily smokers and maximising non smokers. Positive trend by a simple and sensible ban.

I hope to see additional dimensions added to tobacco usage data and observe the dependencies.