<img src="https://pbs.twimg.com/profile_images/1092394418135539713/eplLRcDN_400x400.jpg" width=80px style="text-align:right"><h1>The Internet of Production Alliance </h1>

## Data collection program for the [OKW, Map of facilities](https://www.internetofproduction.org/open-know-where)


Author: Antonio de Jesus Anaya Hernandez, DevOps eng. for the IoPA.

Author: The internet of Production Alliance, 2023.

Data was collected by "Makertour, and its partners", URL location: https://www.makertour.fr/workshops/

The Open Know Where (OKW) Initiative is part of the Internet of Production Alliance and its members.

License: CC BY SA

![CC BY SA](https://mirrors.creativecommons.org/presskit/buttons/88x31/svg/by-sa.svg)

Description: Python code for downloading, parsing, filtering, sorting data, exporting the RAW FabLabs, and the processed IOPA data as CSV.

In [1]:
# This line installs the required libraries for running the script, uncomment the line:
# !pip install -r requirements.txt

In [2]:
import requests
import pandas as pd
import json
from bs4 import BeautifulSoup
from datetime import datetime

In [4]:
date = datetime.now().strftime("%Y_%m_%d_%H%M")
url = "https://www.makertour.fr/map"
org_name = "Maker Tour"
response = requests.get(url)
print(response)

<Response [200]>


In [5]:
html_bytes = response.content
html_string = html_bytes.decode('utf-8')

soup = BeautifulSoup(html_string, 'html.parser')
map_content = soup.find('div', id='map')
markers_string = map_content.get('data-markers')

markers_json = json.loads(markers_string)

In [6]:
def extract_info(html_content):
    soup = BeautifulSoup(html_content, 'html.parser')
    
    title = soup.find('h2', class_='workshop-title').text.strip()
    location = soup.find('span', class_='city-country').text.strip()
    tags = [tag.text.strip() for tag in soup.find_all('span', class_='tag primary')]
    href = soup.find('a', class_='card-link')['href'].strip()

    return [title] + location.split(", ") + [tags, "https://www.makertour.fr" + href]

In [6]:
data = []
for record in markers_json:
    
    data.append(extract_info(record['content']) + [record['lat'], record['lng']])

columns = ['name', 'city', 'country', 'tags', 'url', 'latitude', 'longitude']

In [7]:
input_ = pd.DataFrame(data, columns=columns)

In [8]:
input_.reset_index(drop=True, inplace=True)

In [None]:
file_name = "raw_" + org_name.replace(" ", "_").lower() + date

In [9]:
input_.to_csv('../data/' + file_name + '.csv')

In [10]:
input_.columns.tolist()

['name', 'city', 'country', 'tags', 'url', 'latitude', 'longitude']

In [11]:
print("OKW entries: {r[0]}, columns = {r[1]}".format(r=input_.shape))

OKW entries: 151, columns = 7


In [12]:
#notes: notebooks will change to only collect raw data, and generate DB metadata on preparation for data aggregation process. Cleaning and sorting will be arreanged on following notebooks fro data validation and cleansing.

In [13]:
"""
meta.json
Entries, column_names, creation_date
"""

In [None]:
metadata = {
    "source": org_name,
    "url": url,
    "csv": file_name,
    "records": input_.shape[0],
    "columns": input_.shape[1],
    "attributes": input_.columns.tolist(),
    "updated": date
}
 
# Serializing json
metadata_json = json.dumps(metadata, indent=4)
 
# Writing to sample.json
with open("../data/" + file_name + ".json", "w") as outfile:
    outfile.write(metadata_json)