# <font color='blue'> Scrapping NBA - Basketball data </font> 

#### Website: [NBA Stats](http://stats.nba.com)



#### Tool: [Basket Strategies in Shiny](https://mariazm.shinyapps.io/basketstrategies/)



In [19]:
import json

### JavaScript and JSON to Python dictionaries (.js)

In [5]:

### Downloading the JAVASCRIPT file
### http://stats.nba.com/js/data/ptsd/stats_ptsd.js

out_pathname = "scrap_docs/stats_ptsd.js"
    data = jf.read()
    
# JavaScript Variable
target_variable = "stats_ptsd = "
index_variable = data.find(target_variable)
print ( "\nThe variable required is located at the character-index number {0} \n".format(index_variable) )

# We should notice here that the data is structured in the folllowing way: 
# Douible quotes for prinicipal dictionaries: Generated, seasons_count, teams_count, players_count and data

string_for_json = data[ index_variable+len(target_variable): -1]    # -1 to get rid of the semicolon  ;
data_json = json.loads(string_for_json)

print ("Principal keys: {}\n".format(data_json.keys()))
print ("Secondary data-keys: {}\n".format(data_json['data'].keys()))



The variable required is located at the character-index number 4 

Principal keys: [u'generated', u'players_count', u'data', u'teams_count', u'seasons_count']

Secondary data-keys: [u'seasons', u'players', u'teams']



### JSON to Python dictionaries (.json)


<table class="image">
<caption text-align="center">This is how the data looks today!! </caption>
<tr><td><img src="Capture.PNG" style="max-width:100%; width: 50%; max-width: none; text: 'hello'" ></td></tr>
</table>


In [21]:

### Downloading the JSON file
### http://stats.nba.com/schedule/summerleague/#!?PD=N
### http://data.nba.com/data/10s/v2015/json/mobile_teams/utah/2017/scores/14_todays_scores.json

out_pathname = "scrap_docs/14_full_schedule.json"
with open(out_pathname,'r') as jf:  
    data = json.load(jf)
    
print data['lscd'][0]['mscd']['mon'] + "\n"
print "Options:", data['lscd'][0]['mscd']['g'][1].keys(), "\n"

print "Example:\n"
print data['lscd'][0]['mscd']['g'][0]['v']  #Visit
print data['lscd'][0]['mscd']['g'][0]['h']  # Host
print "\n"


July

Options: [u'bd', u'h', u'ac', u'ptsls', u'seq', u'gcode', u'is', u'v', u'vtm', u'an', u'stt', u'as', u'gid', u'gdte', u'st', u'ppdst', u'gdtutc', u'etm', u'htm', u'utctm', u'seri'] 

Example:

{u'tn': u'Hornets', u're': u'1-0', u's': u'74', u'tid': 1610612766, u'tc': u'Charlotte', u'ta': u'CHA'}
{u'tn': u'Heat', u're': u'0-1', u's': u'67', u'tid': 1610612748, u'tc': u'Miami', u'ta': u'MIA'}




### JSON format with Elastic Search engine

This is an example of how to work with elastic search in Python. You should first install the ElasticSearch Python-package and set-up your elkasticsearch server. In other words, this means (in a local environment) that while using the following Python commands, I need to have a terminal/shell open (inside the elasticsearch/bin folder) in my computer with 'ElasticSearch' exec-file running [(details here).](https://www.elastic.co/downloads/elasticsearch)

This engine is a distributed and multitenant full-text search for JSON free-sructured documents. It has an HTTP web interface (we need the server/host) and it has been used by Facebook, Mozilla, Soundcloud, among others. It was developed in Java and now released public available by Apache. 


In [48]:

from elasticsearch import Elasticsearch, helpers
import sys, os

es = Elasticsearch()

def load_json(filename):
    if filename.endswith('.json'):
        with open(filename,'r') as open_file:
            yield json.load(open_file)

helpers.bulk(es, load_json("scrap_docs/14_full_schedule.json"), index='index_basket', doc_type='games_scores')


(1, [])

In [53]:

## Search for Miami - team

#result = es.search(index='index_basket', doc_type='games_scores', body={"query": {"match": {"_id": "AV1RYls2qXy3PnP3U4gf"}} } )
result= es.get(index='index_basket', doc_type='games_scores', id= "AV1RacXtqXy3PnP3U4go")
                   
result['_source']


{u'lscd': [{u'mscd': {u'g': [{u'ac': u'Orlando',
      u'an': u'Amway Center',
      u'as': u'FL',
      u'bd': {u'b': [{u'disp': u'NBATV',
         u'lan': u'English',
         u'scope': u'natl',
         u'seq': 1,
         u'type': u'tv'}]},
      u'etm': u'2017-07-01T11:00:00',
      u'gcode': u'20170701/CHAMIA',
      u'gdte': u'2017-07-01',
      u'gdtutc': u'2017-07-01',
      u'gid': u'1421700001',
      u'h': {u're': u'0-1',
       u's': u'67',
       u'ta': u'MIA',
       u'tc': u'Miami',
       u'tid': 1610612748,
       u'tn': u'Heat'},
      u'htm': u'2017-07-01T11:00:00',
      u'is': 1,
      u'ppdst': u'I',
      u'ptsls': {u'pl': [{u'fn': u'Okaro',
         u'ln': u'White',
         u'pid': u'1627855',
         u'ta': u'MIA',
         u'tc': u'Miami',
         u'tid': 1610612748,
         u'tn': u'Heat',
         u'val': u'20'}]},
      u'seq': 1,
      u'seri': u'',
      u'st': u'3',
      u'stt': u'Final',
      u'utctm': u'15:00',
      u'v': {u're': u'1-0',
      

In [47]:

es.indices.delete(index='index_basket', ignore=[400, 404])


{u'acknowledged': True}

### JSON from the web  (json response from the web)

In [20]:

### Using internet protocol to download a JSON

import requests
import pandas as pd

my_url= 'http://stats.nba.com/stats/commonteamroster?LeagueID=00&Season=2013-06&TeamID=1610612737'

# We can get this from Inspect -> Network -> Headers -> Request headers -> User Agent
# It changes if you use IOS or Windows 

headers_nba = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
response = requests.get(my_url, headers = headers_nba)

headers = response.json()['resultSets'][0]['headers']
players = response.json()['resultSets'][0]['rowSet']
players_df = pd.DataFrame(players, columns=headers)

players_df.head(10)


Unnamed: 0,TeamID,SEASON,LeagueID,PLAYER,NUM,POSITION,HEIGHT,WEIGHT,BIRTH_DATE,AGE,EXP,SCHOOL,PLAYER_ID
0,1610612737,2013,0,Jeff Teague,0,G,6-2,181,"JUN 10, 1988",26.0,4,Wake Forest,201952
1,1610612737,2013,0,Lou Williams,3,G,6-1,175,"OCT 27, 1986",27.0,8,South Gwinnett HS (GA),101150
2,1610612737,2013,0,Paul Millsap,4,F,6-8,253,"FEB 10, 1985",29.0,7,Louisiana Tech,200794
3,1610612737,2013,0,DeMarre Carroll,5,F,6-8,212,"JUL 27, 1986",27.0,4,Missouri,201960
4,1610612737,2013,0,Pero Antic,6,C,6-11,260,"JUL 29, 1982",31.0,R,Macedonia,203544
5,1610612737,2013,0,Shelvin Mack,8,G,6-3,207,"APR 22, 1990",24.0,2,Butler,202714
6,1610612737,2013,0,John Jenkins,12,G,6-4,215,"MAR 06, 1991",23.0,1,Vanderbilt,203098
7,1610612737,2013,0,Gustavo Ayon,14,F-C,6-10,250,"APR 01, 1985",29.0,2,"Tepic, Mexico",202970
8,1610612737,2013,0,Al Horford,15,C-F,6-10,250,"JUN 03, 1986",28.0,6,Florida,201143
9,1610612737,2013,0,Dennis Schroder,17,G,6-1,168,"SEP 15, 1993",20.0,R,Germany,203471


In [12]:

### Video Demo
#<p>
#<video controls src="Attachments/App_Demo.mp4" />
#</p>
