## Introduction

Twitter has become an important communication channel in times of emergency.
The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies).

But, it’s not always clear whether a person’s words are actually announcing a disaster.

In [None]:
import numpy as np 
import pandas as pd 

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import plotly.express as px

from wordcloud import WordCloud

from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
import plotly.offline as py
py.init_notebook_mode(connected=True)

import warnings
warnings.filterwarnings('ignore')

In [None]:
train = pd.read_csv("../input/nlp-getting-started/train.csv")

## Statistical Analysis

In [None]:
# Print few rows of train data

train.head()

**Let's see the columns of data:**

* **id** - a unique identifier for each tweet
* **text** - the text of the tweet
* **location** - the location the tweet was sent from (may be blank)
* **keyword** - a particular keyword from the tweet (may be blank)
* **target** - in train.csv only, this denotes whether a tweet is about a real disaster (1) or not (0)

In [None]:
# Basic information

train.info()

In [None]:
# Describing data

train.describe() 

In [None]:
# Data types of columns

train.dtypes

**Check missing/null values**

In [None]:
train.isnull().sum()

## Visualization

In [None]:
import missingno as msno
msno.matrix(train)

**Countrywise Distribution**

In [None]:
Loc = train['location'].value_counts()
fig = px.choropleth(Loc.values, locations=Loc.index,
                    locationmode='country names',
                    color=Loc.values,
                    color_continuous_scale=px.colors.sequential.OrRd)
fig.update_layout(title="Countrywise Distribution")
py.iplot(fig, filename='test')

**Categories of target column**

In [None]:
Tar = train['target'].value_counts()

fig = go.Figure([go.Bar(x=Tar.index, y=Tar)])
fig.update_layout(title = "Target Category")
py.iplot(fig, filename='test')

Seems like we have more number of tweets which are not disaster. Quite positive right!! 

**Most used words in text column**

In [None]:
wordcloud = WordCloud(width = 1000, height = 600, max_font_size = 200, max_words = 150,
                      background_color='white').generate(" ".join(train.text))

plt.figure(figsize=[10,10])
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

**Most frequent keywords**

In [None]:
plt.figure(figsize=(16,8))
plt.title('Most frequent keywords',fontsize=16)
plt.xlabel('keywords')

sns.countplot(train.keyword,order=pd.value_counts(train.keyword).iloc[:15].index,palette=sns.color_palette("PuRd", 15))

plt.xticks(size=16,rotation=60)
plt.yticks(size=16)
sns.despine(bottom=True, left=True)
plt.show()

**If you like this kernel please upvote it or if you have any queries or suggestions then leave a comment.**

**I'll be adding more plots. Stay tuned!!**