### Notebook for analysis of cleansed data

This query finds how many artists of each gender are represented in our data. The Artist table is JOINED with the Gender table and then GROUPED BY the gender type of each artist and the corresponding english word for that type. COUNT is used as the aggregate function to count how many artists belong to each gender type. ORDER BY is used to sort the most populous gender types from the less populous.

In [50]:
%%bigquery
select count(artist.gender) as Counts, gender.gender_type as Type from musicbrainz_modeled.Artist_Beam_DF as artist
join musicbrainz_modeled.Gender as gender on artist.gender = gender.gender_id
group by artist.gender, gender.gender_type
union all
select count(*), 'nulls' from musicbrainz_modeled.Artist_Beam_DF as a where a.gender is null
order by counts

Unnamed: 0,Counts,Type
0,380,Not applicable
1,746,Other
2,140187,Female
3,506998,Male
4,956060,nulls


This query finds all the places in each area that have long names. The Area table is JOINED with the Place table and GROUPED BY the place and area names that share the same area_id. The groups that HAVE place names with more than 15 letters are retained. The results are ORDERED BY the area_id so that continental and country level data is shown before more localalized smaller areas.

In [20]:
%%bigquery
select place.place_name as Places, area.area_name as Areas, area.area_id from musicbrainz_modeled.Place_Beam_DF as place
join musicbrainz_modeled.Area_Beam_DF as area on area.area_id = place.area_id
group by place.place_name, area.area_name, area.area_id
having length(place.place_name) > 15
order by area_id

Unnamed: 0,Places,Areas,area_id
0,Core Music Factory,Andorra,5
1,Rothera Research Station,Antarctica,8
2,Colombia Records,Argentina,10
3,Digisound Mastering,Argentina,10
4,The Garden Mastering,Argentina,10
...,...,...,...
19386,Ye Olde Orchard Pub & Grill,Monkland Village,118964
19387,Segal Centre for Performing Arts,Snowdon,118965
19388,Montreal Sound Studio,Snowdon,118965
19389,Playhouse Studio,Snowdon,118965


This query associates events with event types. The Event table is JOINED with the Event_Type table and is then GROUPED BY the event name and type.

In [31]:
%%bigquery
select event.event_name, et.type from musicbrainz_modeled.Event_Beam_DF as event
join musicbrainz_modeled.Event_Type as et on event.event_type = et.event_type_id
group by event.event_name, et.type

Unnamed: 0,event_name,type
0,Nicolas Fortin Scholarship,Masterclass/Clinic
1,Yamaha Tone Made Easy,Masterclass/Clinic
2,Aids Walk+Ride Charity,Masterclass/Clinic
3,Donegal Fiddlers’ Summer School 1995,Masterclass/Clinic
4,Saleem Ashkar Piano Masterclass,Masterclass/Clinic
...,...,...
30968,コミックマーケット91,Convention/Expo
30969,コミックマーケット93,Convention/Expo
30970,コミックマーケット95,Convention/Expo
30971,コミックマーケット97,Convention/Expo


This query identifies which languages have been used in over 10,000 releases. The Language table is JOINED with the Release table so that the language names can be paired with the language id of each release. The table is GROUPED BY the language used in the Release where each language grouped must HAVE 10,000 releases in that language. Finally, the resuls are ORDERED so that the langauge used for the most releases is shown first.

In [34]:
%%bigquery
select Lang.language_name as ReleaseLanguage, count(Rel.rel_id) as NumRels
from musicbrainz_modeled.Release_Beam_DF as Rel
    join musicbrainz_modeled.Language_Beam_DF as Lang
        on Rel.language = Lang.language_id
group by Lang.language_name
    having NumRels > 10000
order by NumRels desc

Unnamed: 0,ReleaseLanguage,NumRels
0,English,1633739
1,Japanese,115896
2,[Multiple languages],79911
3,German,71217
4,Spanish,67811
5,French,64727
6,Italian,22383
7,Portuguese,21935
8,Russian,18361
9,Finnish,18128


This query identifies the top ten cities from where music artists start their careers. The Area table is JOINED with the Artist table so that the artists can be GROUPED BY the cities (areas with area_type = 3) where they started their careers. Then the results are ORDERED BY which cities produced the most artists.

In [36]:
%%bigquery
select a.area_name as StartCity, count(*) as NumArtists
from musicbrainz_modeled.Artist_Beam_DF as artist
    join musicbrainz_modeled.Area_Beam_DF as a
        on artist.begin_area_id = a.area_id
    where a.area_type = 3
group by a.area_name
order by NumArtists desc
limit 10

Unnamed: 0,StartCity,NumArtists
0,London,5181
1,New York,3456
2,Los Angeles,3149
3,Paris,2383
4,Chicago,2330
5,Berlin,1904
6,Wien,1667
7,Toronto,1326
8,Philadelphia,1294
9,Buenos Aires,1213


This query uses a nested query to identify the extinct label (a label that is no longer used in the current day) with the longest period of activity measured in years. The Label table is filtered so that the MAX duration of a label's activities that was found in the nested query is matched with the label(s) name and start and end date.

In [39]:
%%bigquery
select l1.label_name, l1.begin_year, l1.end_year, (l1.end_year - l1.begin_year) as duration
from musicbrainz_modeled.Label_Beam_DF as l1
where (l1.end_year - l1.begin_year) = (
  select max(l2.end_year - l2.begin_year) 
  from musicbrainz_modeled.Label_Beam_DF as l2
  )

Unnamed: 0,label_name,begin_year,end_year,duration
0,Cotta'sche Buchhandlung,1659,1977,318


### Make Views for Data Studio Report

A report will be made for the top ten cities to have been the starting places of Artists' careers. Thus, a view is made below that can be imported into data studio.

In [2]:
%%bigquery
create or replace view musicbrainz_modeled.v_ten_artist_cities as (
select a.area_name as StartCity, count(*) as NumArtists
from `earnest-keep-266820.musicbrainz_modeled.Artist_Beam_DF` as artist
    join `earnest-keep-266820.musicbrainz_modeled.Area_Beam_DF` as a
        on artist.begin_area_id = a.area_id
    where a.area_type = 3
group by a.area_name
order by NumArtists desc
limit 10
)

Another view is made so that information describing the genders of different artists can be included in the Data Studio report

In [7]:
%%bigquery
create or replace view musicbrainz_modeled.v_artist_genders as (
select count(artist.gender) as Counts, gender.gender_type as Type 
from `earnest-keep-266820.musicbrainz_modeled.Artist_Beam_DF` as artist
join `earnest-keep-266820.musicbrainz_modeled.Gender` as gender on artist.gender = gender.gender_id
group by artist.gender, gender.gender_type
union all
select count(*), 'nulls' from `earnest-keep-266820.musicbrainz_modeled.Artist_Beam_DF` as a where a.gender is null
order by counts
)