The Global Adult Tobacco Survey (GATS) provides a comprehensive and internationally recognized dataset on tobacco use among adults in various countries, including India. By analyzing the GATS data specific to India, we gain valuable insights into the prevalence, patterns, and determinants of tobacco consumption among its diverse population.

This data analysis endeavor focuses on understanding the multifaceted landscape of tobacco use in India. We delve into key aspects such as the prevalence of smoking and smokeless tobacco use, regional variations, socio-demographic factors influencing tobacco consumption, and the health consequences associated with tobacco use.GATS data allows us to examine the prevalence of tobacco use across different age groups, genders, and socio-economic backgrounds, facilitating the identification of vulnerable population segments and disparities in tobacco consumption rates. Understanding these variations is essential for developing targeted interventions and policies to address the specific needs of diverse communities.

Furthermore, we explore the influence of tobacco marketing, packaging, and pricing on consumer behavior, providing valuable insights into the effectiveness of tobacco control measures. By analyzing the GATS data, we can advocate for evidence-based policies that aim to reduce tobacco initiation and promote cessation.The health implications of tobacco use are significant and widespread. Through data analysis, we assess the impact of tobacco on various health outcomes, including the heightened risk of cardiovascular diseases, respiratory illnesses, and cancer. This analysis emphasizes the urgent need for comprehensive tobacco control strategies to safeguard public health.Additionally, we highlight successful initiatives and programs implemented to combat tobacco use in India, showcasing evidence-based interventions that have yielded positive results. By disseminating these success stories, we can inspire further efforts and collaborations to address the tobacco epidemic effectively.

The insights drawn from the data analysis of tobacco use among Indians according to the Global Adult Tobacco Survey serve as a crucial foundation for evidence-based policymaking, public health campaigns, and targeted interventions. By understanding the complex dynamics of tobacco consumption, we can work towards creating a tobacco-free future for India, promoting health, and ensuring the well-being of its citizens.

In [2]:
# Install all require library
!pip install requests
!pip install bs4

Collecting bs4
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... [?25l[?25hdone
  Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1257 sha256=e4e3906b0b68a7d99894239e8a9e477a4109e64a2f653c94c623698f203e41ed
  Stored in directory: /root/.cache/pip/wheels/25/42/45/b773edc52acb16cd2db4cf1a0b47117e2f69bb4eb300ed0e70
Successfully built bs4
Installing collected packages: bs4
Successfully installed bs4-0.0.1


In [3]:
# Import all require library like request,os,panda,csv and beautiful.
import requests
import pandas as pd
import os
from bs4 import BeautifulSoup as bs
import csv
import numpy as np

In [4]:
# After import all the libraries we need to headers and save it variable.
headers={"User-Agent":"mozilla/5.0"}

# Now we need to save the url in a variable from where the data will download.
url='https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7942198/'

# Now use requests library to get the url with headers and convert into text and save in a variable.
rqd = requests.get(url,headers=headers).text

# Then use 'panda.read' to read the data in html.
dm = pd.read_html(rqd)[0]

# Now save the data in to a variable after convert into a dataframe with the help of 'pd.DataFrame'.
gm=pd.DataFrame(dm)

# We used head() function to see the data first 10 rows.
gm.head(10)

Unnamed: 0_level_0,State,NFHS-5 (2019–20),NFHS-5 (2019–20),GATS (2016–17),GATS (2016–17)
Unnamed: 0_level_1,State,Men,Women,Men,Women
0,Andhra Pradesh,22.6,3.8,30.0,10.1
1,Andaman and Nicobar Islands,58.7,31.3,,
2,Assam,51.8,22.1,62.9,32.9
3,Bihar,48.8,5.0,43.4,6.9
4,"Dadra and Nagar Haveli, Daman and Diu",38.6,2.9,,
5,Goa,18.2,2.6,15.3,4.0
6,Gujarat,41.1,8.7,35.5,10.4
7,Himachal Pradesh,32.3,1.7,30.4,1.7
8,Jammu and Kashmir,38.3,3.6,39.7,6.2
9,Karnataka,27.1,8.5,35.2,10.3


In [5]:
# Check with columns name.
gm.columns

MultiIndex([(           'State', 'State'),
            ('NFHS-5 (2019–20)',   'Men'),
            ('NFHS-5 (2019–20)', 'Women'),
            (  'GATS (2016–17)',   'Men'),
            (  'GATS (2016–17)', 'Women')],
           )

In [6]:
# We have to rename columns headers so we appled range function followed by shape for rename columns as index.
gm.columns=range(gm.shape[1])

In [7]:
# Checking data with head function.
gm.head()

Unnamed: 0,0,1,2,3,4
0,Andhra Pradesh,22.6,3.8,30.0,10.1
1,Andaman and Nicobar Islands,58.7,31.3,,
2,Assam,51.8,22.1,62.9,32.9
3,Bihar,48.8,5.0,43.4,6.9
4,"Dadra and Nagar Haveli, Daman and Diu",38.6,2.9,,


In [8]:
# Now rename columns header as we want to. Save all the columns name as index position into a variable. Then maping with the dataframe.
sd={gm.columns[0]:'State',gm.columns[1]:'Men_NFHS',gm.columns[2]:'Women_NFHS',gm.columns[3]:'Men',gm.columns[4]:'Women',}
gm=gm.rename(columns=sd)

In [9]:
# Checking data with head function.
gm.head()

Unnamed: 0,State,Men_NFHS,Women_NFHS,Men,Women
0,Andhra Pradesh,22.6,3.8,30.0,10.1
1,Andaman and Nicobar Islands,58.7,31.3,,
2,Assam,51.8,22.1,62.9,32.9
3,Bihar,48.8,5.0,43.4,6.9
4,"Dadra and Nagar Haveli, Daman and Diu",38.6,2.9,,


In [10]:
# Remove unwanted columns.
gm.drop(['Men_NFHS'],axis=1,inplace=True)

In [11]:
# Remove unwanted columns.
gm.drop(['Women_NFHS'],axis=1,inplace=True)

In [12]:
# Checking Data with head function.
gm.head(25)

Unnamed: 0,State,Men,Women
0,Andhra Pradesh,30.0,10.1
1,Andaman and Nicobar Islands,,
2,Assam,62.9,32.9
3,Bihar,43.4,6.9
4,"Dadra and Nagar Haveli, Daman and Diu",,
5,Goa,15.3,4.0
6,Gujarat,35.5,10.4
7,Himachal Pradesh,30.4,1.7
8,Jammu and Kashmir,39.7,6.2
9,Karnataka,35.2,10.3


In [13]:
# Find null value in data frame.
gm.isna().sum()

State    0
Men      4
Women    4
dtype: int64

In [14]:
# Replace NaN to NA with the help of 'np.nan' function followed by replace.
gm=gm.replace(np.nan,'NA',regex=True)

In [15]:
# Check null value again.
gm.isna().sum()

State    0
Men      0
Women    0
dtype: int64

In [16]:
# Also check data frame.
gm.head()

Unnamed: 0,State,Men,Women
0,Andhra Pradesh,30.0,10.1
1,Andaman and Nicobar Islands,,
2,Assam,62.9,32.9
3,Bihar,43.4,6.9
4,"Dadra and Nagar Haveli, Daman and Diu",,


In [17]:
# Now export the data as csv format.
gm.to_csv('Tobacco Consumption India GATS.csv',index=False,header=True)