# Sparkify Analysis

In this notebook, I investigate some of the available data from the created database.

In [6]:
import configparser
import psycopg2
import bokeh

import pandas as pd
import numpy  as np


## Data acquisition

In [8]:
# Getting data from redshift
conf = configparser.ConfigParser()
conf.read('../dwh.cfg')

try:
    conn = psycopg2.connect("host={} dbname={} user={} password={} port={}".format(*conf['CLUSTER'].values()))
    cur = conn.cursor()
    
    # If the connection is ok
    print('Connected to Redshift.')

except Exception as e:
    print(f'Error while connecting: {e}')


Connected to Redshift.


## Sparkify Analytics

As Sparkify wants to analyze its data, it is important to understand the behavior of their clients. Below I provide some possible research directions for the company to learn about their data.

### Artists

One can investigate the origin of frequently listened artists. We can find out the origin of each band/artist available in sparkify's database.

In [26]:
artist_data = pd.read_sql("SELECT * FROM artists", conn)


In [33]:
# Samples of artist_data
artist_data.head(10)

Unnamed: 0,artist_id,name,location,latitude,longitude
0,AR00DG71187B9B7FCB,Basslovers United,,,
1,AR00FVC1187FB5BE3E,Panda,"Monterrey, NL, México",25.67084,-100.30953
2,AR00MQ31187B9ACD8F,Chris Carrier,,,
3,AR01WHF1187B9B53B8,Lullatone,"Nagoya, Japan",,
4,AR026BB1187B994DC3,Ijahman Levi,,,
5,AR039B11187B9B30D0,John Williams,"NEW YORK, New York",,
6,AR03Z7E1187FB44816,The Colourfield Featuring Sinead O'Connor,"Manchester, England",53.4796,-2.24881
7,AR040M31187B98CA41,The Bug Featuring Spaceape,,,
8,AR040RJ1187FB4D2AB,Azure Ray,,,
9,AR049S81187B9AE8A5,The Human League,"Sheffield, Yorkshire, England",53.38311,-1.46454


In [31]:
# How many unique artists present in the database?
print('Number of artists in database: {}'.format(artist_data.shape[0]))

Number of artists in database: 10025


In [35]:
# where are they from? Finding the top 20 origin places of artists
artist_data.location.value_counts().head(20)

                      4805
London, England        147
Los Angeles, CA        146
New York, NY           107
California - LA         75
Chicago, IL             74
NY - New York City      70
Brooklyn, NY            52
Detroit, MI             46
California              45
Philadelphia, PA        45
Texas                   40
San Francisco, CA       38
Atlanta, GA             34
United States           33
London                  33
Boston, MA              32
Seattle, WA             31
Memphis, TN             30
CANADA - Ontario        30
Name: location, dtype: int64

In [None]:
# All the artists, a geographic heatmap