# Data Inspection + Preparation

The file `food_articles_dec_23_2021.json` contains the output of a SPARQL query on wikidata. This is the query: 
```python
# Food
SELECT ?item ?itemLabel
WHERE
{
 ?item wdt:P279 wd:Q2095. 
 # Must be a `subclass of` the `food` entity
 SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } 
 # Helps get the label in your language, if not, then en language
}
```

I will now inspect the data

In [1]:
import json
import pandas as pd

dict_data = {}
with open('food_articles_dec_23_2021.json') as f:
    dict_data = json.load(f)
    f.close()
    
df = pd.DataFrame.from_dict(dict_data)
df.head()

Unnamed: 0,item,itemLabel
0,http://www.wikidata.org/entity/Q4287,margarine
1,http://www.wikidata.org/entity/Q4551,borscht
2,http://www.wikidata.org/entity/Q6099,escalope
3,http://www.wikidata.org/entity/Q6591,speculaas
4,http://www.wikidata.org/entity/Q6663,hamburger


Ok so it seems like I need to clean this up a lil bit.
Step 1:

In [2]:
# Strips away the `https://.../` part from the 'item' column
def stripURL(input):
    return input[31:]

df['item'] = df['item'].apply(stripURL)
df.head()

Unnamed: 0,item,itemLabel
0,Q4287,margarine
1,Q4551,borscht
2,Q6099,escalope
3,Q6591,speculaas
4,Q6663,hamburger


Next step: use the Q codes to retrieve more info about each entry. I will use the package `pywikibot` to interface with the wikidata API

In [21]:
import pywikibot

site = pywikibot.Site("wikidata", "wikidata")
repo = site.data_repository()

# Testing: gonna try to pull up the wiki links for row '2'
qcode = df.iloc[2]['item']
item = pywikibot.ItemPage(repo, qcode)

In [36]:
item_labels = item.get()['labels']


In [40]:
item_urls = []
for k,v in item_labels.items():
    item_urls.append(f"https://{k}.wikipedia.org/wiki/{v}")

item_urls


['https://de.wikipedia.org/wiki/Schnitzel',
 'https://en.wikipedia.org/wiki/escalope',
 'https://fr.wikipedia.org/wiki/escalope',
 'https://ar.wikipedia.org/wiki/إسكالوب',
 'https://ca.wikipedia.org/wiki/Escalopes',
 'https://cs.wikipedia.org/wiki/řízek',
 'https://en-ca.wikipedia.org/wiki/escalope',
 'https://fa.wikipedia.org/wiki/شنتسل',
 'https://he.wikipedia.org/wiki/שניצל',
 'https://ja.wikipedia.org/wiki/エスカロープ',
 'https://nl.wikipedia.org/wiki/schnitzel',
 'https://ru.wikipedia.org/wiki/эскалоп',
 'https://sl.wikipedia.org/wiki/Zrezek',
 'https://sv.wikipedia.org/wiki/Schnitzel',
 'https://uk.wikipedia.org/wiki/ескалоп',
 'https://zh.wikipedia.org/wiki/香雞排',
 'https://zh-cn.wikipedia.org/wiki/香雞排',
 'https://zh-hans.wikipedia.org/wiki/香雞排',
 'https://zh-hant.wikipedia.org/wiki/香雞排',
 'https://de-ch.wikipedia.org/wiki/Schnitzel',
 'https://pl.wikipedia.org/wiki/eskalopka',
 'https://yi.wikipedia.org/wiki/ווינער שניצל',
 'https://eo.wikipedia.org/wiki/eskalopo',
 'https://pt.wikip