# Exploratory Data Analysis on Pub Dataset

Let’s assume you are on a vacation in the United Kingdom with your friends. For fun, you decided to go to the Pubs nearby for some drinks. Google Map is down because of some issues.

While searching the internet, you came across https://www.getthedata.com/open-pubs. On this website, you found all the pub locations (Specifically Latitude and Longitude info). In order to impress your friends, you decided to create a web application with the data available in your hand.

In [10]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt


In [8]:
data_des=pd.read_csv("data_dictionary - Sheet1.csv")
data_des

Unnamed: 0,Field,Possible Values,Comments
0,fsa_id,int,Food Standard Agency's ID for this pub.
1,name,string,Name of the pub.
2,address,string,Address fields separated by commas.
3,postcode,string,Postcode of the pub.
4,easting,int,
5,northing,int,
6,latitude,decimal,
7,longitude,decimal,
8,local_authority,string,Local authority this pub falls under.


In [33]:
df=pd.read_csv("open_pubs.csv", names=['fsa_id','name','address','postcode','easting','northing','latitude','longitude','local_authority'])
df.head()

Unnamed: 0,22,Anchor Inn,"Upper Street, Stratford St Mary, COLCHESTER",CO7 6LW,604749,234404,51.970379,0.979340,Babergh
0,36,Ark Bar Restaurant,"Ark Bar And Restaurant, Cattawade Street, Bran...",CO11 1RH,610194,233329,51.958698,1.057832,Babergh
1,74,Black Boy,"The Lady Elizabeth, 7 Market Hill, SUDBURY, Su...",CO10 2EA,587334,241316,52.038595,0.729915,Babergh
2,75,Black Horse,"Lower Street, Stratford St Mary, COLCHESTER",CO7 6JS,622675,-5527598,\N,\N,Babergh
3,76,Black Lion,"Lion Road, Glemsford, SUDBURY",CO10 7RF,622675,-5527598,\N,\N,Babergh
4,97,Brewers Arms,"The Brewers Arms, Bower House Tye, Polstead, C...",CO6 5BZ,598743,240655,52.028694,0.895650,Babergh


In [34]:
df.describe()

Unnamed: 0,22,604749,234404
count,51330.0,51330.0,51330.0
mean,299401.204189,429853.99061,227194.0
std,169358.946123,98556.969786,727745.9
min,36.0,78110.0,-5527598.0
25%,167800.5,361449.0,179243.5
50%,303719.5,428770.0,287251.5
75%,438957.75,509795.75,408942.0
max,597137.0,655277.0,1209661.0


In [35]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51330 entries, 0 to 51329
Data columns (total 9 columns):
 #   Column                                       Non-Null Count  Dtype 
---  ------                                       --------------  ----- 
 0   22                                           51330 non-null  int64 
 1   Anchor Inn                                   51330 non-null  object
 2   Upper Street, Stratford St Mary, COLCHESTER  51330 non-null  object
 3   CO7 6LW                                      51330 non-null  object
 4   604749                                       51330 non-null  int64 
 5   234404                                       51330 non-null  int64 
 6   51.970379                                    51330 non-null  object
 7   0.979340                                     51330 non-null  object
 8   Babergh                                      51330 non-null  object
dtypes: int64(3), object(6)
memory usage: 3.5+ MB


In [36]:
df.shape

(51330, 9)

In [37]:
df.duplicated().sum()

0

In [38]:
df.isnull().sum()

22                                             0
Anchor Inn                                     0
Upper Street, Stratford St Mary, COLCHESTER    0
CO7 6LW                                        0
604749                                         0
234404                                         0
51.970379                                      0
0.979340                                       0
Babergh                                        0
dtype: int64

In [39]:
df.longitude.value_counts()

AttributeError: 'DataFrame' object has no attribute 'longitude'

In [40]:
df.latitude.value_counts()

AttributeError: 'DataFrame' object has no attribute 'latitude'

In [41]:
df.replace('\\N',np.nan,inplace=True)

In [42]:
df.isnull().sum()

22                                               0
Anchor Inn                                       0
Upper Street, Stratford St Mary, COLCHESTER      0
CO7 6LW                                          0
604749                                           0
234404                                           0
51.970379                                      767
0.979340                                       767
Babergh                                          0
dtype: int64

In [43]:
df.dropna(inplace=True)

In [44]:
df.isnull().sum()

22                                             0
Anchor Inn                                     0
Upper Street, Stratford St Mary, COLCHESTER    0
CO7 6LW                                        0
604749                                         0
234404                                         0
51.970379                                      0
0.979340                                       0
Babergh                                        0
dtype: int64

In [45]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 50563 entries, 0 to 51329
Data columns (total 9 columns):
 #   Column                                       Non-Null Count  Dtype 
---  ------                                       --------------  ----- 
 0   22                                           50563 non-null  int64 
 1   Anchor Inn                                   50563 non-null  object
 2   Upper Street, Stratford St Mary, COLCHESTER  50563 non-null  object
 3   CO7 6LW                                      50563 non-null  object
 4   604749                                       50563 non-null  int64 
 5   234404                                       50563 non-null  int64 
 6   51.970379                                    50563 non-null  object
 7   0.979340                                     50563 non-null  object
 8   Babergh                                      50563 non-null  object
dtypes: int64(3), object(6)
memory usage: 3.9+ MB


In [46]:
df.latitude = df.latitude.astype(float)
df.longitude=df.longitude.astype(float)

AttributeError: 'DataFrame' object has no attribute 'latitude'

In [47]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 50563 entries, 0 to 51329
Data columns (total 9 columns):
 #   Column                                       Non-Null Count  Dtype 
---  ------                                       --------------  ----- 
 0   22                                           50563 non-null  int64 
 1   Anchor Inn                                   50563 non-null  object
 2   Upper Street, Stratford St Mary, COLCHESTER  50563 non-null  object
 3   CO7 6LW                                      50563 non-null  object
 4   604749                                       50563 non-null  int64 
 5   234404                                       50563 non-null  int64 
 6   51.970379                                    50563 non-null  object
 7   0.979340                                     50563 non-null  object
 8   Babergh                                      50563 non-null  object
dtypes: int64(3), object(6)
memory usage: 3.9+ MB


In [48]:
import plotly.express as px
fig = px.scatter_mapbox(df, lat="Latitude", lon="Longitude", hover_name="Name", 
                        hover_data=["Address", "PostCode", "Local_Authority"], zoom=6, height=700, width=900)

fig.update_layout(mapbox_style="open-street-map")
fig.show()

ValueError: Value of 'hover_name' is not the name of a column in 'data_frame'. Expected one of ['22', 'Anchor Inn', 'Upper Street, Stratford St Mary, COLCHESTER', 'CO7 6LW', '604749', '234404', '51.970379', '0.979340', 'Babergh'] but received: Name