- VISUALIZATION PROJECT Geospatial Business Intelligence (BI)
    * Make a geospartial analysis of the `companies` dataset
    * Things you know:
        - You have a software company with 50 employees
        - The company creates video games
        - Roles in your company: 20 developers, 20 Designers/Creatieves/UX/UI and 10 executives/managers
    * Do an analysis about placing the new company offices in the best environment based on the following criteria:
        - There should be software engineers working around
        - The surroundings must have a good ratio of big companies vs startups
        - Ensure you have in your surroundings companies that cover the interests of your team
        - Avoid old companies, prefer recently created ones

In [220]:
from pymongo import MongoClient
import pandas as pd
from pandas.io.json import json_normalize
pd.set_option('display.max_columns', 500)

Contecto con la base de datos y selecciono las categorias que me interesan:
- Empresas de web, software y video juegos
- Que las empresas tengan entre 50 y 250 empleados
- Que tengan información en latitud, longitud, ciudad, pais, nombre...etc

In [221]:
client = MongoClient ('localhost', 27017)
db = client['companies']
collection = db.companies
data = collection.find(
    {'$and':[
        {'$or':[
            {'category_code': 'web'},
            {'category_code': 'software'},
            {'category_code': 'games_video'}]},
        {'$or':[
            {'number_of_employees': {'$lte': 50}},
            {'number_of_employees': {'$gte': 250}}]},
        {'offices.latitude': {'$ne': None}},
        {'offices.longitude': {'$ne': None}},
        {'offices.city': {'$ne': None}},
        {'offices.country_code': {'$ne': None}},
        {'name': {'$ne': None}},
        {'founded_year': {'$gte': 2008}}, {'offices': { '$ne': []} }]},
    {'name' : 1, 'category_code' : 1, 'number_of_employees' : 1,
     'founded_year' : 1, 'offices.country_code': 1, 'offices.city': 1, 'offices.latitude' : 1, 'offices.longitude' : 1, '_id': 0})

print(data)

<pymongo.cursor.Cursor object at 0x7f8358ebf7f0>


Hago un json_normalize

In [222]:
data1 = json_normalize(data = data, record_path = 'offices', meta = ['name', 'category_code', 'number_of_employees', 'founded_year'])
data1.head()

Unnamed: 0,city,country_code,latitude,longitude,name,category_code,founded_year,number_of_employees
0,San Francisco,USA,37.782263,-122.392142,GoingOn,software,2008,40
1,Bellevue,USA,47.597965,-122.151158,YouBeQB,games_video,2008,8
2,San Francisco,USA,37.781265,-122.393229,Crunchyroll,games_video,2008,50
3,San Mateo,USA,37.566879,-122.323895,Fixya,web,2013,30
4,Norderstedt,DEU,53.707739,10.023246,alluc,games_video,2009,7


In [223]:
def get_lat(coord): 
    try: 
        return coord[0]['latitude']
    except:
        return None

In [224]:
def get_long(coord): 
    try: 
        return coord[0]['longitude']
    except:
        return None

In [225]:
serie=[]
index=0
for i in data1['longitude']:
    serie.append({'type':'Point','coordinates':[data1['longitude'][index],data1['latitude'][index]]})
    index+=1
serie

[{'coordinates': [-122.392142, 37.782263], 'type': 'Point'},
 {'coordinates': [-122.151158, 47.597965], 'type': 'Point'},
 {'coordinates': [-122.393229, 37.781265], 'type': 'Point'},
 {'coordinates': [-122.323895, 37.566879], 'type': 'Point'},
 {'coordinates': [10.023246, 53.707739], 'type': 'Point'},
 {'coordinates': [-118.461884, 34.031764], 'type': 'Point'},
 {'coordinates': [-85.717393, 38.257035], 'type': 'Point'},
 {'coordinates': [-122.187925, 47.704875], 'type': 'Point'},
 {'coordinates': [151.039775, -34.054416], 'type': 'Point'},
 {'coordinates': [51.5082, 25.2948], 'type': 'Point'},
 {'coordinates': [-122.404488, 37.790032], 'type': 'Point'},
 {'coordinates': [3.165225, 50.694712], 'type': 'Point'},
 {'coordinates': [-74.370313, 40.219214], 'type': 'Point'},
 {'coordinates': [-113.5291728, 53.4665377], 'type': 'Point'},
 {'coordinates': [-119.306607, 37.269175], 'type': 'Point'},
 {'coordinates': [-122.294847, 37.534317], 'type': 'Point'},
 {'coordinates': [24.704663, 60.214

Creo una lista de diccionarios con las coordenadas y type, la cual me va a ser necesaria para crear la geolocalización.

In [226]:
data1.head()

Unnamed: 0,city,country_code,latitude,longitude,name,category_code,founded_year,number_of_employees
0,San Francisco,USA,37.782263,-122.392142,GoingOn,software,2008,40
1,Bellevue,USA,47.597965,-122.151158,YouBeQB,games_video,2008,8
2,San Francisco,USA,37.781265,-122.393229,Crunchyroll,games_video,2008,50
3,San Mateo,USA,37.566879,-122.323895,Fixya,web,2013,30
4,Norderstedt,DEU,53.707739,10.023246,alluc,games_video,2009,7


Añado esa columna al dataset.

In [227]:
data1['Point'] = serie

In [228]:
data1.head()

Unnamed: 0,city,country_code,latitude,longitude,name,category_code,founded_year,number_of_employees,Point
0,San Francisco,USA,37.782263,-122.392142,GoingOn,software,2008,40,"{'coordinates': [-122.392142, 37.782263], 'typ..."
1,Bellevue,USA,47.597965,-122.151158,YouBeQB,games_video,2008,8,"{'coordinates': [-122.151158, 47.597965], 'typ..."
2,San Francisco,USA,37.781265,-122.393229,Crunchyroll,games_video,2008,50,"{'coordinates': [-122.393229, 37.781265], 'typ..."
3,San Mateo,USA,37.566879,-122.323895,Fixya,web,2013,30,"{'coordinates': [-122.323895, 37.566879], 'typ..."
4,Norderstedt,DEU,53.707739,10.023246,alluc,games_video,2009,7,"{'coordinates': [10.023246, 53.707739], 'type'..."


Importo ese dataset en formato json.

In [229]:
data1.to_json('dataclean.json',orient='records', lines=True)