
<h3>A description of the problem to be solved</h3>

Skateboard industry averages $0.5 bln a year and sales are trending upwards (source -https://www.grandviewresearch.com/industry-analysis/skateboard-market). Skateboards alone are a popular and stable product with high demand, not even mentioning goods related to skateboarding industry (clothing, accessories etc). In my report I'm going to cover the problems that a small business owner faces when trying to open a skateboarding shop in a huge city. So the topic of my research is how to succesfully open a skateboarding store in London, with a little help from data analysis. 
In my opinion, there are 3 problems that each new skateshop owner would face:

1) **Where?** One of the toughest questions is choosing the right location. Multiple reports (https://www.researchgate.net/publication/228637298_The_Impact_of_Retail_Location_on_Retailer_Revenues_An_Empirical_Investigation) show the importance of location for success in retail industry. It's important especially in cities like London where rental and housing prices skyrocket. For skateboarding industry this question is of particular importance, as we need to take into account the most frequently visited locations of target audience (skaters)

2) **How?** It is crucial to choose what exactly differentiate us from other businesses - understanding the core value proposition for a brand. Given that there is some competition in market and open data available, it's vital to do a research and understand what people don't / do like about the existing players in a location and what we can bring to a table as a new shop

3) **What?** Chosing the right product mix. Skaters are particularly picky about brands, one of the risks of retail industry is overstocking with the brands that customers aren't willing to buy. So we need to analyze what is a customer demand for skateboarding brands in a given are and make product decision with strong data fundamentals in place

So given that the audience of this report is potential skateboard shop owners, it could be quite beneficial if we could answer those questions with data - this would allow them to avoid costly mistakes and maximize the probability of success. 

<h4>Data selection for the project.</h4>
Now that we have agreed on the problems we'll be solving, let's look at the data available. 


1. Choosing location <br>
Methodology: 
    - 1.1. One particular problem I need to solve - is locating most frequently visited locations by target audience. What are those? Skateparks of course. Once we get the skatepark data, our goal would be to find locations suitable for a new skateshop as close to these parks as possible. 
    - 1.2. To find these locations we'll be looking at existing skateshop locations in each London Borough. The goal we have here is somehow decide - does a particular borough have enough skateshops? To solve this problem we need to use of course one of the classification / regression models. We'll fit the model with the existing skateshop data and make a predictions on number of skateshops per Borough. If a predicted number is greater than current amount of skateshops in a borough - that means there is a demand in a new stores. And the opposite, if there are too many skateshops, we need to avoid those Boroughs
    - 1.3. Having Skatepark and Skateshop Model data in place we'll need to combine this data to find prospective boroughs as close to existing skateparks as possible. It might be considered as a clusterization problem, or can be solved through calculating some type of distances (like Euclidean). However, given that it's a geo data I'd like to solve it through visualization on a map. 

2. Choosing the differentiation against other skateparks.<br>
Methodology:
     - 2.1. Find some open data on customer reviews for all skateparks in london and classify them as bad / good reviews
     - 2.2. Run a simple semantic analysis (world cloud) to understand what makes a good / bad skateshop
  
3. Choosing the product mix for a skatepark.<br>
Methodology:
     - 3.1. Analyze open keyword data on most popular skateboarding brands
     - 3.2. Understand which of this brand keywords are trending by looking into user google searches 



<h2>Finding most frequently visited locations by target audience</h2>

Data description:
For a problem described in **1.1** I need to look into Foursquare skateparks data in London.
Prerequisites: to make sure that I can use them as Geo Data I need to retrieve the coordinates as well

In [22]:
# building a request url for forsquare 

RADIUS=1000
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
near='London'
category='4bf58dd8d48988d167941735'
limit=500
query='skate park'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)
radius=500
LIMIT=100
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&category={}&near={}&query={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    category, 
    near, 
    query,
    limit)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


In [18]:
results = requests.get(url).json()


So the data source for 1.1 looks like a data frame with skatepark name and locations

In [7]:
venues = results['response']['groups'][0]['items']  
skatepark_venues = json_normalize(venues) # flatten JSON
skatepark_venues=skatepark_venues[['venue.name','venue.location.lat','venue.location.lng']]
skatepark_venues.head()

Unnamed: 0,venue.name,venue.location.lat,venue.location.lng
0,House of Vans,51.500678,-0.113944
1,Southbank Skate Park,51.506911,-0.116636
2,Clapham Common Skatepark,51.460802,-0.143462
3,BaySixty6,51.520528,-0.204501
4,Mile End Skate Park,51.517583,-0.03146


<h2>Skateshop locations in each London Borough</h2>

part 1.2 is less trivial - as I have a goal to find 1. all the coordinates for all the boroughs 2. some data set that will supply featrues for a predictive model, something like demography data for each Borough

<h3>Table 1.2 - list of all London Boroughs with coordinates</h3>

For table 1 I'll parse London Boroughs table from wikipedia

In [25]:

# parsing wiki page into html 

fp = urllib.request.urlopen("")
mybytes = fp.read()

mystr = mybytes.decode("utf8")
fp.close()

# parsing html into BS soup object

soup = BeautifulSoup(mystr, 'html.parser')


In [26]:
# creting lists for each of the parsed table rows

table = soup.find('table', attrs={'class':'wikitable'})
table_rows = table.find_all('tr')
l = []
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text.replace('\n','') for tr in td]
    l.append(row)


In [13]:
# creting header columns for each of the table headers

col_rows=table.find_all('th')

cols=[]
for th in col_rows:
    row=th.text.replace('\n','')
    cols.append(row)
cols

['Borough',
 'Inner',
 'Status',
 'Local authority',
 'Political control',
 'Headquarters',
 'Area (sq mi)',
 'Population (2013 est)[1]',
 'Co-ordinates',
 ' Nr. in map ']

In [14]:
# building a dataframe

df=pd.DataFrame(l, columns=cols)
df = df.iloc[1:]
df.head()

Unnamed: 0,Borough,Inner,Status,Local authority,Political control,Headquarters,Area (sq mi),Population (2013 est)[1],Co-ordinates,Nr. in map
1,Barking and Dagenham [note 1],,,Barking and Dagenham London Borough Council,Labour,"Town Hall, 1 Town Square",13.93,194352,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E﻿ /...,25
2,Barnet,,,Barnet London Borough Council,Conservative,"Barnet House, 2 Bristol Avenue, Colindale",33.49,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W﻿ /...,31
3,Bexley,,,Bexley London Borough Council,Conservative,"Civic Offices, 2 Watling Street",23.38,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E﻿ /...,23
4,Brent,,,Brent London Borough Council,Labour,"Brent Civic Centre, Engineers Way",16.7,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W﻿ /...,12
5,Bromley,,,Bromley London Borough Council,Conservative,"Civic Centre, Stockwell Close",57.97,317899,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E﻿ /...,20


This Dataframe will be the main one to retrieve data for geo location

In [15]:
# doing some optimization of the column string content
df['Borough']=df['Borough'].str.replace(" [note]",'')
l=[]
for x in df['Borough']:
    x=x.replace(" [note]","")
    x=x.replace(" [note 1]","")
    x=x.replace(" [note 4]","")
    l.append(x)
df['Borough']=l


In [16]:
lattitude=[]
longitude=[]
for x in df['Co-ordinates']:
    z=x.index('51.') 
    z1=x.index('0.')
    if x.find('W'):
        long=0-float(x[z1:z1+6])
    else:
        long=float(x[z1:z1+6])
    lat=float(x[z:z+7])
    lattitude.append(lat)
    
    longitude.append(long)

The final dataframe with London Boroughs look like this

In [17]:
df['Longitude']=longitude
df['Lattitude']=lattitude
df.head()

Unnamed: 0,Borough,Inner,Status,Local authority,Political control,Headquarters,Area (sq mi),Population (2013 est)[1],Co-ordinates,Nr. in map,Longitude,Lattitude
1,Barking and Dagenham,,,Barking and Dagenham London Borough Council,Labour,"Town Hall, 1 Town Square",13.93,194352,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E﻿ /...,25,-0.1557,51.5607
2,Barnet,,,Barnet London Borough Council,Conservative,"Barnet House, 2 Bristol Avenue, Colindale",33.49,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W﻿ /...,31,-0.1517,51.6252
3,Bexley,,,Bexley London Borough Council,Conservative,"Civic Offices, 2 Watling Street",23.38,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E﻿ /...,23,-0.1505,51.4549
4,Brent,,,Brent London Borough Council,Labour,"Brent Civic Centre, Engineers Way",16.7,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W﻿ /...,12,-0.2817,51.5588
5,Bromley,,,Bromley London Borough Council,Conservative,"Civic Centre, Stockwell Close",57.97,317899,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E﻿ /...,20,-0.0198,51.4039


<h2>Table 1.2.2. London open demography data with stats for each Borough</h2>

For a purpose of the table with features I downloaded a file from counsel website containing london census data

In [283]:
london_boroughs=pd.read_csv("/london_boroughs.csv",skiprows=0)
london_boroughs=london_boroughs[1:]
london_boroughs.head()

Unnamed: 0,Code,New code,Area name,Inner/ Outer London,GLA Population Estimate 2016,GLA Household Estimate 2016,Inland Area (Hectares),Population density (per hectare) 2016,"Average Age, 2016","Proportion of population aged 0-15, 2016","Proportion of population of working-age, 2016","Proportion of population aged 65 and over, 2016",Net internal migration (2014),Net international migration (2014),Net natural change (2014),% of resident population born abroad (2014),Largest migrant population by country of birth (2011),% of largest migrant population (2011),Second largest migrant population by country of birth (2011),% of second largest migrant population (2011),Third largest migrant population by country of birth (2011),% of third largest migrant population (2011),% of population from BAME groups (2016),% people aged 3+ whose main language is not English (2011 Census),"Overseas nationals entering the UK (NINo), (2014/15)","New migrant (NINo) rates, (2014/15)",Largest migrant population arrived during 2014/15,Second largest migrant population arrived during 2014/15,Third largest migrant population arrived during 2014/15,Employment rate (%) (2015),Male employment rate (2015),Female employment rate (2015),Unemployment rate (2015),Youth Unemployment (claimant) rate 18-24 (Dec-14),Proportion of 16-18 year olds who are NEET (%) (2014),Proportion of the working-age population who claim out-of-work benefits (%) (Aug-2015),% working-age with a disability (2015),Proportion of working age people with no qualifications (%) 2015,Proportion of working age with degree or equivalent and above (%) 2015,"Gross Annual Pay, (2015)",Gross Annual Pay - Male (2015),Gross Annual Pay - Female (2015),Modelled Household median income estimates 2012/13,% adults that volunteered in past 12 months (2010/11 to 2012/13),Number of jobs by workplace (2014),% of employment that is in public sector (2014),"Jobs Density, 2014","Number of active businesses, 2014",Two-year business survival rates (started in 2012),Crime rates per thousand population 2014/15,Fires per thousand population (2014),Ambulance incidents per hundred population (2014),"Median House Price, 2014","Average Band D Council Tax charge (£), 2015/16",New Homes (net) 2014/15 (provisional),"Homes Owned outright, (2014) %","Being bought with mortgage or loan, (2014) %","Rented from Local Authority or Housing Association, (2014) %","Rented from Private landlord, (2014) %","% of area that is Greenspace, 2005",Total carbon emissions (2013),"Household Waste Recycling Rate, 2014/15","Number of cars, (2011 Census)","Number of cars per household, (2011 Census)","% of adults who cycle at least once per month, 2013/14","Average Public Transport Accessibility score, 2014","Achievement of 5 or more A*- C grades at GCSE or equivalent including English and Maths, 2013/14",Rates of Children Looked After (2015),% of pupils whose first language is not English (2015),% children living in out-of-work households (2014),"Male life expectancy, (2012-14)","Female life expectancy, (2012-14)",Teenage conception rate (2014),Life satisfaction score 2011-14 (out of 10),Worthwhileness score 2011-14 (out of 10),Happiness score 2011-14 (out of 10),Anxiety score 2011-14 (out of 10),Childhood Obesity Prevalance (%) 2014/15,People aged 17+ with diabetes (%),Mortality rate from causes considered preventable 2012/14,Political control in council,Proportion of seats won by Conservatives in 2014 election,Proportion of seats won by Labour in 2014 election,Proportion of seats won by Lib Dems in 2014 election,Turnout at 2014 local elections
1,E09000001,E09000001,City of London,Inner London,8548,5179,290.4,28.9,42.9,27.2,90.6,9.4,138,252,35,.,United States,2.8,France,2.0,Australia,1.9,27.5,17.1,892,151.0,France,United States,India,64.6,.,.,.,1.2,.,3.8,.,.,.,.,.,.,"£99,390",.,500400,3.4,84.6,19250,63.0,.,12.3,.,765000,943,230,.,.,.,.,4.8,1417.5,34.4,1692,0.4,.,7.9,78.6,84,.,9.1,.,.,.,6.59,7.08,5.99,5.57,,2.6,128.8,.,.,.,.,.
2,E09000002,E09000002,Barking and Dagenham,Outer London,205773,76841,3610.8,57.3,32.9,21.0,86.1,13.9,-1118,2543,2509,37.4,Nigeria,4.7,India,2.3,Pakistan,2.3,49.5,18.7,7727,62.0,Romania,Bulgaria,Lithuania,65.8,75.6,56.5,11.0,7.3,5.7,11.7,17.2,11.3,32.2,"£28,428","£29,792","£25,251","£34,080",21,58900,21.1,0.47,5690,73.0,83.4,3.0,13.7,215000,1332,510,16.4,27.4,35.9,20.3,33.6,783.2,23.4,56966,0.8,6.5,3.0,58.0,77,47.2,22.5,77.6,82.1,32.4,7.14,7.6,7.05,3.05,25.3,7.3,227.6,Lab,0.0,100.0,0.0,36.5
3,E09000003,E09000003,Barnet,Outer London,385108,149147,8674.8,44.5,37.2,21.0,83.3,16.7,-1884,4770,2938,35.9,India,3.1,Poland,2.4,Iran,2.0,38.7,23.4,14412,59.0,Romania,Poland,Italy,68.5,74.5,62.9,8.5,3.5,2.5,6.7,14.9,5.2,49.0,"£33,084","£37,058","£30,449","£54,530",33,167300,18.7,0.69,24555,70.0,62.7,1.6,11.1,400000,1397,1320,32.4,25.2,11.1,31.1,41.3,1552.7,38.0,144717,1.1,12.1,3.0,67.3,34,43.4,10.8,82.1,85.1,12.8,7.48,7.76,7.37,2.75,18.4,6.0,133.8,Cons,50.8,47.6,1.6,40.5
4,E09000004,E09000004,Bexley,Outer London,243303,97233,6058.1,39.9,38.9,20.8,89.0,11.0,1273,699,1195,16.1,Nigeria,2.6,India,1.5,Ireland,0.9,21.4,6.0,2108,14.0,Romania,Nigeria,Poland,75.1,82.1,68.5,7.6,3.8,3.4,7.3,15.9,10.8,33.5,"£32,040","£36,020","£25,776","£44,430",22,80700,15.9,0.54,8430,75.0,51.8,2.3,11.8,250000,1446,810,38.1,35.3,15.2,11.4,31.7,1060.9,54.0,108507,1.2,9.2,2.6,60.3,50,15.0,15.4,80.4,84.4,19.5,7.38,7.7,7.21,3.29,21.4,6.9,164.3,Cons,71.4,23.8,0.0,39.6
5,E09000005,E09000005,Brent,Outer London,328568,119166,4323.3,76.1,35.5,20.1,82.5,17.5,-6932,6717,3694,56.2,India,9.2,Poland,3.4,Ireland,2.9,64.9,37.2,25130,115.0,Romania,Italy,Portugal,69.5,76.0,62.6,7.5,6.1,2.6,9.0,17.7,6.2,45.1,"£29,777","£31,149","£27,653","£39,630",17,133600,17.6,0.61,14680,70.0,78.8,1.8,12.1,385000,1354,1560,22.2,22.6,20.4,34.8,21.9,1292.6,35.2,87802,0.8,11.7,3.7,60.1,44,62.7,16.0,80.1,85.1,18.5,7.25,7.35,7.22,2.92,23.9,7.9,169.4,Lab,9.5,88.9,1.6,36.3


<h2>Table 2.1.Open data on customer reviews for all skateparks in London</h2>
<p>For a purpose of this task I just downloaded the data of skateboarding shop reivews from google maps and stored it in csv</p>

In [19]:
df=pd.read_csv('/Skateshop.csv')
df.head()

Unnamed: 0,Mark,Review
0,4,"The staff were friendly and very helpful, and ..."
1,4,My only advice is to check the website before ...
2,3,"The guy who assisted me was very nice, patient..."
3,5,Super helpful and super nice guys. Got a custo...
4,2,Went into the store yesterday. Arrogant and un...


<h2>Table 3. Keyword data on most popular skateboarding brands</h2>
<p>For a purpose of this task parsed the list of popular skate brands</p>

In [None]:
fp = urllib.request.urlopen()
mybytes = fp.read()

mystr = mybytes.decode("utf8")
fp.close()

In [20]:
# some of the results I got
selector = ['adidas',
 'Anti Hero',
 'Antix',
 'Anuell',
 'Baker',
 'Carhartt WIP',
 'Carpet Company',
 'Chocolate',
 'Converse',
 'DC',
 'Dickies',
 'Element',
 'Emerica',
 'Enjoi',
 'Etnies',
 'Flip',
 'Girl',
 'GX1000']

['adidas',
 'Anti Hero',
 'Antix',
 'Anuell',
 'Baker',
 'Carhartt WIP',
 'Carpet Company',
 'Chocolate',
 'Converse',
 'DC',
 'Dickies',
 'Element',
 'Emerica',
 'Enjoi',
 'Etnies',
 'Flip',
 'Girl',
 'GX1000']

Then we can user Google ads keyword planner API to get results on which of them have most of traffic 

In [None]:
# best keyword estimate



offset = 0
selector['paging'] = {
    'startIndex': str(offset),
    'numberResults': str(PAGE_SIZE)
}


page = targeting_idea_service.get(selector)

for result in page['entries']:
  attributes = {}
  for attribute in result['data']:
    attributes[attribute['key']] = getattr(
        attribute['value'], 'value', '0')
  print('Keyword with "%s" text and average monthly search volume '
        '"%s" was found with Products and Services categories: %s.'
        % (attributes['KEYWORD_TEXT'],
            attributes['SEARCH_VOLUME'],
            attributes['CATEGORY_PRODUCTS_AND_SERVICES']))    
