# Initial research questions

* comparative analysis of gender representation in artwork creation between born digital and analogue art collections
* with also, potentially, some details on medium, location…
* semantic analysis of the narrative about the artworks or what are the keywords associated with different artwork types

### The story could be:
The internet was supposed to revolutionize things, so how did it do when looking at who makes art and who gets included in collections?


A simple way to plan your work is:

 * choose the research question
 * map the question to pieces of information needed to answer the question (e.g. periods, countings)
 * map the data to specific data types (categorical, numerical, ordinal)
 * choose the plot(s) that better help you to visualise some pattern (e.g. a bar chart)
 * get your data in some form (SPARQL query results)
 * filter/ manipulate your data (select the variables that matter, make operations like countings) 
 * create a data structure that fits the plotting requirements (a table, a JSON etc) including the number of variables needed (e.g. one categorical and one numerical)


In [980]:
#imports 

import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
from plotly.subplots import make_subplots
import pycountry
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import numpy as np
from wordcloud import *
from functools import reduce
pd.set_option('display.max_rows', 200)
from statsmodels.graphics.mosaicplot import mosaic
from plotly.subplots import make_subplots

In [981]:
# upload datasets 
path = './'

#complete DFs
rhz_artworks = pd.read_pickle(path+'Rhizome_data/rhizome_artworks_extra.pkl')
rhz_artists = pd.read_pickle(path+'Rhizome_data/rhizome_artists_extra.pkl')
moma_artists = pd.read_pickle(path+'MOMA_data/pickle/MoMAArtists.pkl')
moma_artworks = pd.read_pickle(path+'MOMA_data/pickle/MoMAartworks.pkl')
moma_artworks_old =  pd.read_pickle(path+'MOMA_data/pickle/old_artworks.pkl')
moma_artworks_old['DateAcquired'] = moma_artworks_old['DateAcquired'].replace('nan', str('0'))
moma_artworks_old['DateAcquired']=moma_artworks_old['DateAcquired'].astype('int')
moma_artworks_new =  pd.read_pickle(path+'MOMA_data/pickle/new_artworks.pkl')
moma_artworks_new['DateAcquired'] = moma_artworks_new['DateAcquired'].replace('nan', str('0'))
moma_artworks_new['DateAcquired']=moma_artworks_new['DateAcquired'].astype('int')
moma_artworks_new =  moma_artworks_new.loc[moma_artworks_new['DateAcquired'] >= 1980]
moma_rhz_compare = moma_artworks_new.loc[moma_artworks_new['DateAcquired'] >= 2000]
moma_rhz_compare = moma_rhz_compare.loc[moma_rhz_compare['DateCreated'] >= 1983]

#MoMA department DFs
moma_arch_cont = pd.read_pickle(path+'MOMA_data/pickle/departments/architecture_design_cont.pkl')
moma_arch_mod = pd.read_pickle(path+'MOMA_data/pickle/departments/architecture_design_mod.pkl')
moma_design_cont = pd.read_pickle(path+'MOMA_data/pickle/departments/architecture_design_img_cont.pkl')
moma_design_mod = pd.read_pickle(path+'MOMA_data/pickle/departments/architecture_design_img_mod.pkl')
moma_draw_cont = pd.read_pickle(path+'MOMA_data/pickle/departments/draws_prints_cont.pkl')
moma_draw_mod = pd.read_pickle(path+'MOMA_data/pickle/departments/draws_prints_mod.pkl')
moma_films_cont = pd.read_pickle(path+'MOMA_data/pickle/departments/films_cont.pkl')
moma_films_mod = pd.read_pickle(path+'MOMA_data/pickle/departments/films_mod.pkl')
moma_fluxus_cont = pd.read_pickle(path+'MOMA_data/pickle/departments/fluxus_cont.pkl')
moma_fluxus_mod = pd.read_pickle(path+'MOMA_data/pickle/departments/fluxus_mod.pkl')
moma_media_cont = pd.read_pickle(path+'MOMA_data/pickle/departments/media_perf_cont.pkl')
moma_media_mod = pd.read_pickle(path+'MOMA_data/pickle/departments/media_perf_mod.pkl')
moma_paint_cont = pd.read_pickle(path+'MOMA_data/pickle/departments/paint_sculp_cont.pkl')
moma_paint_mod = pd.read_pickle(path+'MOMA_data/pickle/departments/paint_sculp_mod.pkl')
moma_photo_cont = pd.read_pickle(path+'MOMA_data/pickle/departments/photo_cont.pkl')
moma_photo_mod = pd.read_pickle(path+'MOMA_data/pickle/departments/photo_mod.pkl')

#Rhizome with text
rhizome_txt_clean = pd.read_pickle(path+'Rhizome_data/rhizome_artworks_extra_text_clean.pkl')
rhizome_txt_stop_kw = pd.read_pickle(path+'Rhizome_data/rhizome_artworks_extra_text_clean_stop_keywords.pkl')

#MoMA with text
moma_arch_cont_text = pd.read_pickle(path+'MOMA_data/pickle/departments/architecture_design_cont_text_final.pkl')
moma_arch_mod_text = pd.read_pickle(path+'MOMA_data/pickle/departments/architecture_design_mod_text_only_final.pkl')
moma_draw_cont_text = pd.read_pickle(path+'MOMA_data/pickle/departments/draws_prints_cont_text_final.pkl')
moma_draw_mod_text = pd.read_pickle(path+'MOMA_data/pickle/departments/draws_prints_mod_text_final.pkl')
moma_films_cont_text = pd.read_pickle(path+'MOMA_data/pickle/departments/films_cont_text_final.pkl')
moma_films_mod_text = pd.read_pickle(path+'MOMA_data/pickle/departments/films_mod_text_final.pkl')
moma_fluxus_cont_text = pd.read_pickle(path+'MOMA_data/pickle/departments/fluxus_cont_text_final.pkl')
moma_fluxus_mod_text = pd.read_pickle(path+'MOMA_data/pickle/departments/fluxus_mod_text_final.pkl')
moma_media_cont_text = pd.read_pickle(path+'MOMA_data/pickle/departments/media_perf_cont_text_final.pkl')
moma_media_mod_text = pd.read_pickle(path+'MOMA_data/pickle/departments/media_perf_mod_text_final.pkl')
moma_paint_cont_text = pd.read_pickle(path+'MOMA_data/pickle/departments/paint_sculp_cont_text_final.pkl')
moma_paint_mod_text = pd.read_pickle(path+'MOMA_data/pickle/departments/paint_sculp_mod_text_final.pkl')
moma_photo_cont_text = pd.read_pickle(path+'MOMA_data/pickle/departments/photo_cont_text_final.pkl')
moma_photo_mod_text = pd.read_pickle(path+'MOMA_data/pickle/departments/photo_mod_text_final.pkl')

#moma text stop
moma_arch_cont_text_stop = pd.read_pickle(path+'MOMA_data/pickle/departments/architecture_design_cont_text_final_stop.pkl')
moma_arch_mod_text_stop = pd.read_pickle(path+'MOMA_data/pickle/departments/architecture_design_mod_text_only_final_stop.pkl')
moma_draw_cont_text_stop = pd.read_pickle(path+'MOMA_data/pickle/departments/draws_prints_cont_text_final_stop.pkl')
moma_draw_mod_text_stop = pd.read_pickle(path+'MOMA_data/pickle/departments/draws_prints_mod_text_final_stop.pkl')
moma_films_cont_text_stop = pd.read_pickle(path+'MOMA_data/pickle/departments/films_cont_text_final_stop.pkl')
moma_films_mod_text_stop = pd.read_pickle(path+'MOMA_data/pickle/departments/films_mod_text_final_stop.pkl')
moma_fluxus_cont_text_stop = pd.read_pickle(path+'MOMA_data/pickle/departments/fluxus_cont_text_final_stop.pkl')
moma_fluxus_mod_text_stop = pd.read_pickle(path+'MOMA_data/pickle/departments/fluxus_mod_text_final_stop.pkl')
moma_media_cont_text_stop = pd.read_pickle(path+'MOMA_data/pickle/departments/media_perf_cont_text_final_stop.pkl')
moma_media_mod_text_stop = pd.read_pickle(path+'MOMA_data/pickle/departments/media_perf_mod_text_final_stop.pkl')
moma_paint_cont_text_stop = pd.read_pickle(path+'MOMA_data/pickle/departments/paint_sculp_cont_text_final_stop.pkl')
moma_paint_mod_text_stop = pd.read_pickle(path+'MOMA_data/pickle/departments/paint_sculp_mod_text_final_stop.pkl')
moma_photo_cont_text_stop = pd.read_pickle(path+'MOMA_data/pickle/departments/photo_cont_text_final_stop.pkl')
moma_photo_mod_text_stop = pd.read_pickle(path+'MOMA_data/pickle/departments/photo_mod_text_final_stop.pkl')

In [982]:
def getDecade(date):
    decade = str(date)[:-1]
    if decade == '':
       decade = '0'
    decade_int = int(decade)*10
    return decade_int

In [983]:
#set 53 different colors for moma nationalities and 59 for rhiome
momacolorscale = ['#FFC0CB', '#FF69B4', '#DB7093', '#C71585', '#E6E6FA', '#DDA0DD', '#EE82EE', '#9932CC', '#8B008B', '#9370DB', '#483D8B', '#4B0082', '#FA8072', '#DC143C', '#8B0000', '#FF8C00', '#FF4500', '#FFD700', '#FFFF00', '#F0E68C', '#7FFF00', '#32CD32', '#90EE90', '#00FA9A', '#3CB371', '#006400', '#9ACD32', '#6B8E23', '#66CDAA', '#20B2AA', '#5F9EA0', '#4682B4', '#B0C4DE', '#87CEFA', '#6495ED', '#00BFFF', '#1E90FF', '#0000CD', '#191970', '#FFDEAD', '#DEB887', '#BC8F8F', '#DAA520', '#B8860B', '#D2691E', '#8B4513', '#A52A2A', '#F0FFF0', '#708090', '#2F4F4F', '#696969', '#8FBC8F', '#00CED1' ]
rhzcolorscale = ['#FFFAFA','#FFC0CB', '#FF69B4', '#DB7093', '#C71585', '#E6E6FA', '#DDA0DD', '#EE82EE', '#9932CC', '#8B008B', '#9370DB', '#483D8B', '#4B0082', '#FA8072', '#DC143C', '#8B0000', '#FF8C00', '#FF4500', '#FFD700', '#FFFF00', '#F0E68C', '#7FFF00', '#32CD32', '#90EE90', '#00FA9A', '#3CB371', '#006400', '#9ACD32', '#6B8E23', '#66CDAA', '#20B2AA', '#5F9EA0', '#4682B4', '#B0C4DE', '#87CEFA', '#6495ED', '#00BFFF', '#1E90FF', '#0000CD', '#191970', '#FFDEAD', '#DEB887', '#BC8F8F', '#DAA520', '#B8860B', '#D2691E', '#8B4513', '#A52A2A', '#F0FFF0', '#708090', '#2F4F4F', '#696969', '#8FBC8F', '#00CED1', '#F0FFF0', '#F0FFFF', '#F5F5DC', '#FFFFF0', '#FFE4E1' ]
print(len(rhzcolorscale))

59


## 4 What is the nationality representation in each datasets?


Viz: stacked area chart (Chiara) + breakdown by decades 
Story: AMERICA!!! BUT ALSO FRANCE!!!! Basically colonialism. 

does nationality representation change over time as a percentage of the total acquisitions? is there a way to talk about representation based on nationality in the story? 


In [992]:
set(MoMA_complete.Gender)

{'F',
 'F, F',
 'F, F, F',
 'F, F, F, M',
 'F, F, F, M, M, missing',
 'F, F, F, missing',
 'F, F, M',
 'F, F, M, F, M',
 'F, F, M, F, M, M, M, missing',
 'F, F, M, F, missing',
 'F, F, M, M, F, missing',
 'F, F, M, M, M, missing',
 'F, F, M, M, missing',
 'F, F, M, missing',
 'F, F, missing',
 'F, M',
 'F, M, F',
 'F, M, F, M, M',
 'F, M, F, M, M, M',
 'F, M, F, M, M, M, M',
 'F, M, F, M, M, missing, missing',
 'F, M, F, M, missing',
 'F, M, M',
 'F, M, M, F, F, missing',
 'F, M, M, F, M, M',
 'F, M, M, F, M, missing',
 'F, M, M, F, missing',
 'F, M, M, F, missing, missing',
 'F, M, M, M',
 'F, M, M, M, M, F, missing, M',
 'F, M, M, M, M, M, F, M, F, F, M, M, F, M, M, M, M, F, M, M, F, F, M, M, M, M, M, F, F, missing',
 'F, M, M, M, M, M, F, M, M, F, M, missing',
 'F, M, M, M, M, M, M, M, M, F, M, M, M, M, M, M, missing',
 'F, M, M, M, M, M, M, M, M, M, M, M, F, M, M, M, M, missing',
 'F, M, M, M, M, M, M, M, M, M, M, M, M, M, M, M, F, M, M, M, M, M, M, M, M, M, M, M, M, missing',
 'F,

In [996]:
# create working df for MoMA complete create dictionary keeping saved the 'others' nationalities
MoMA_complete = pd.concat([moma_artworks_old,moma_artworks_new])
nationalities = ', '.join(MoMA_complete.Nationality)
nationalities = list(set(nationalities.split(', ')))
df_moma_complete = pd.DataFrame(columns= ["Nationality", "DateAcquired", "Count", "Females", "Males", 'Missing', 'Non-Binary']) 

OthersMoma = dict()
for nat in nationalities:
    nat_sub =  MoMA_complete[MoMA_complete.Nationality == nat]   
     
    nat_sub.DateAcquired = nat_sub.DateAcquired.astype('str')
    years = ', '.join(nat_sub.DateAcquired)
    years = sorted(list([item[:4] for item in list(set(years.split(', ')))]))[0:-1]
    for year in years:
        year_sub = nat_sub[nat_sub.DateAcquired == year]
        entries_year = len(year_sub)
        f_count = len(year_sub[year_sub.Gender == 'F'])
        m_count = len(year_sub[year_sub.Gender == 'M'])
        missing_count = len(year_sub[year_sub.Gender == 'missing'])
        nonbinary_count = len(year_sub[year_sub.Gender == 'NB'])

        if entries_year < 100:
            if year not in OthersMoma:
                OthersMoma[year] = set([nat])
            else: 
                new = OthersMoma[year]
                new.add(nat)
                OthersMoma[year] = set(new)
            nationality = 'others'
            
        else: 
            nationality = nat
        df_moma_complete.loc[len(df_moma_complete.index)] = [nationality, year, entries_year, f_count, m_count, missing_count, nonbinary_count]
  

df_moma_complete.DateAcquired = df_moma_complete.DateAcquired.astype('int')
df_moma_complete['Decade'] = 'todo'
df_moma_complete["Decade"] = df_moma_complete["DateAcquired"].apply(lambda date: getDecade(date))
# replot dataset any nationality that has count < 5 becomes 'Others'
MC_working = df_moma_complete[df_moma_complete.DateAcquired > 1929]
MC_working.DateAcquired.astype('int')
MC_working_final = MC_working.groupby(['Nationality','DateAcquired', 'Decade'],as_index=False).agg({'Count': 'sum','Females': 'sum','Males': 'sum','Missing': 'sum','Non-Binary': 'sum'})
MC_working_final = MC_working_final.sort_values(by='Decade')



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

In [985]:
# create working df for RHIZOME complete create dictionary keeping saved the 'others' nationalities
nationalities = ', '.join(rhz_artworks.Nationality)
nationalities = list(set(nationalities.split(', ')))
df_rhz_nats = pd.DataFrame(columns= ["Nationality", "DateAcquired", "Count", "Females", "Males"]) 

OthersRhz = dict()
for nat in nationalities:
    nat_sub =  rhz_artworks[rhz_artworks.Nationality == nat]    
    nat_sub.dateAcquired = nat_sub.dateAcquired.astype('str')
    years = ', '.join(nat_sub.dateAcquired)
    years = sorted(list([item[:4] for item in list(set(years.split(', ')))]))[0:-1]
    for year in years:
        year_sub = nat_sub[nat_sub.dateAcquired == year]
        entries_year = len(year_sub)
        f_count = len(year_sub[year_sub.Gender == 'F'])
        m_count = len(year_sub[year_sub.Gender == 'M'])
        missing_count = len(year_sub[year_sub.Gender == 'missing'])
        nonbinary_count = len(year_sub[year_sub.Gender == 'NB'])
        if entries_year < 20: 
            if year not in OthersRhz:
                OthersRhz[year] = set([nat])
            else: 
                new = OthersRhz[year]
                new.add(nat)
                OthersRhz[year] = set(new)
            nationality = 'others'
        else: 
            nationality = nat
        df_rhz_nats.loc[len(df_rhz_nats.index)] = [nat, year, entries_year, f_count, m_count]

df_rhz_nats.DateAcquired = df_rhz_nats.DateAcquired.astype('int')
df_rhz_nats['Decade'] = 'todo'
df_rhz_nats["Decade"] = df_rhz_nats["DateAcquired"].apply(lambda date: getDecade(date))
RH_working = df_rhz_nats[df_rhz_nats.DateAcquired > 0]
RH_working_final = RH_working.groupby(['Nationality','DateAcquired', 'Decade'],as_index=False).agg({'Count': 'sum','Females': 'sum' ,'Males': 'sum','Missing': 'sum','Non-Binary': 'sum'})
RH_working_final = RH_working_final.sort_values(by='Decade')



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

{'2001': {'American ',
  'American/British',
  'American/South African',
  'Argentine',
  'Australian',
  'Austrian',
  'Belgian',
  'Brasilian',
  'British',
  'Canadian',
  'Czech',
  'Dutch',
  'Finnish ',
  'French',
  'German',
  'Iranian',
  'Israeli',
  'Italian',
  'Italian ',
  'Japanese',
  'Mexican',
  'Norwegian',
  'Polish',
  'Russian',
  'Serbian',
  'Slovenian',
  'South Korean',
  'Spanish',
  'Swedish',
  'Taiwanese',
  'Venezuelan'},
 '2002': {'American ',
  'American/South African',
  'Argentine',
  'Argentine ',
  'Australian',
  'Austrian',
  'Belgian',
  'Brasilian',
  'British',
  'British/Canadian',
  'Canadian',
  'Chilean?Chilean',
  'Chinese',
  'German',
  'Greek',
  'Hungarian',
  'Icelandic/American',
  'Indian',
  'Irish',
  'Israeli',
  'Italian',
  'Japanese',
  'Mexican',
  'Norwegian',
  'Polish',
  'Russian',
  'South African',
  'South Korean',
  'Spanish',
  'Swedish',
  'Swiss',
  'Taiwanese'},
 '2003': {'American ',
  'American/British',
  'Amer

In [986]:
# create MOMA df with country names instead of nationality
natios_MOMA = set(df_moma_complete.Nationality)

missing = pd.DataFrame({'Aalborgenser': ['Korean', 'Native American', 'Canadian Inuit'], 'Aalborg': ['Korea', 'United States', 'Canada']})
missing
df_natParse = pd.read_csv('https://raw.githubusercontent.com/knowitall/chunkedextractor/master/src/main/resources/edu/knowitall/chunkedextractor/demonyms.csv')
correct_country = pd.concat([missing, df_natParse])
Country_df=df_moma_complete.copy()

Country_df = Country_df[Country_df.DateAcquired <2000]
for item in natios_MOMA:
    my = correct_country[correct_country['Aalborgenser'] == item]
    
    country = my[:1]['Aalborg'].values
    if len(country)>0:
        country_str = my[:1]['Aalborg'].values[0]

        Country_df.loc[Country_df["Nationality"] == item, "Nationality"] = country_str

MC_countries_count = pd.DataFrame(columns= ['Nation', 'Count'])

new_set = set(Country_df.Nationality)
for item in new_set:
    subCountry = Country_df[Country_df['Nationality'] == item]
    sum_acquisitions = subCountry['Count'].sum()
    MC_countries_count.loc[len(MC_countries_count.index)] = [item, sum_acquisitions]

def do_fuzzy_search(country):
    try:
        result = pycountry.countries.search_fuzzy(country)
        return result[0].alpha_3
    except:
        return np.nan

MC_countries_count["country_code"] = MC_countries_count["Nation"].apply(lambda country: do_fuzzy_search(country))

In [987]:
# create Rhizome df with country neames instead of nationality
natios_RH = set(df_rhz_nats.Nationality)
Country_df=df_rhz_nats.copy()
for item in natios_RH:
    my = correct_country[correct_country['Aalborgenser'] == item]
    
    country = my[:1]['Aalborg'].values
    if len(country)>0:
        country_str = my[:1]['Aalborg'].values[0]

        Country_df.loc[Country_df["Nationality"] == item, "Nationality"] = country_str

RH_countries_count = pd.DataFrame(columns= ['Nation', 'Count'])

new_set = set(Country_df.Nationality)
for item in new_set:
    subCountry = Country_df[Country_df['Nationality'] == item]
    sum_acquisitions = subCountry['Count'].sum()
    RH_countries_count.loc[len(RH_countries_count.index)] = [item, sum_acquisitions]

def do_fuzzy_search(country):
    try:
        result = pycountry.countries.search_fuzzy(country)
        return result[0].alpha_3
    except:
        return np.nan

RH_countries_count["country_code"] = RH_countries_count["Nation"].apply(lambda country: do_fuzzy_search(country))

### Plots: Choropleth Maps
##### Variables: Country, Artworks count, country code

In [988]:
#DF example 
MC_countries_count.head()

Unnamed: 0,Nation,Count,country_code
0,France,18702,FRA
1,Belgium,632,BEL
2,United States,35957,USA
3,Japan,125,JPN
4,United Kingdom,2134,GBR


### Moma complete DF

In [989]:
# create moma map plot

fig = go.Figure(data=go.Choropleth(
    locations = MC_countries_count['country_code'],
    z = MC_countries_count['Count'],
    text = MC_countries_count['Nation'],
    thick0=;
    colorscale=[
            [0,"#8e5a79"],
            [0.3 ,"#925f7d"],
            [0.325 ,"#966582"],
            [0.350 ,"#9a6a87"],
            [0.375,"#9e708b"],
            [0.4 ,"#a27590"],
            [0.525 ,"#b690a6"],
            [0.550 ,"#ba96ab"],
            [0.575,"#be9baf"],
            [0.6 ,"#c2a0b4"],
            [0.625 ,"#c6a6b8"],
            [0.650 ,"#caabbd"],
            [0.675,"#ceb1c2"],
            [0.7 ,"#d2b6c6"],
            [0.725 ,"#d6bccb"],
            [0.995 ,"#d9c1cf"],
            [1, "#FFFFFF"]],
    autocolorscale=False,
    reversescale=True,
    marker_line_color='darkgray',
    marker_line_width=0.5,
    colorbar_title = 'Total artworks',
))

fig.update_layout(
    title_text='Moma artworks acquired after the 2000, spread over Nations',
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='orthographic',
        showocean=True, oceancolor="LightBlue"
    ),
    height = 700,
)


fig.show()

SyntaxError: expression cannot contain assignment, perhaps you meant "=="? (2015783055.py, line 7)

### Rhizome DF

In [None]:
# create Rhizome map plot

fig = go.Figure(data=go.Choropleth(
    locations = RH_countries_count['country_code'],
    z = RH_countries_count['Count'],
    text = RH_countries_count['Nation'],
    colorscale=[
            [0,"#8e5a79"],
            [0.3 ,"#925f7d"],
            [0.325 ,"#966582"],
            [0.350 ,"#9a6a87"],
            [0.375,"#9e708b"],
            [0.4 ,"#a27590"],
            [0.525 ,"#b690a6"],
            [0.550 ,"#ba96ab"],
            [0.575,"#be9baf"],
            [0.6 ,"#c2a0b4"],
            [0.625 ,"#c6a6b8"],
            [0.650 ,"#caabbd"],
            [0.675,"#ceb1c2"],
            [0.7 ,"#d2b6c6"],
            [0.725 ,"#d6bccb"],
            [0.995 ,"#d9c1cf"],
            [1, "#FFFFFF"]],
    autocolorscale=False,
    reversescale=True,
    marker_line_color='darkgray',
    marker_line_width=0.5,
    colorbar_title = 'Total artworks',
))

fig.update_layout(
    title_text='Rhizome total artworks number spread over nations',
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='orthographic',
        showocean=True, oceancolor="LightBlue"
    ),
    height = 700,
)

fig.show()


### Plots: Stacked bar charts 
##### Variables: Nationality, Acquisition date, Artworks count for nationality

In [None]:
len(set(RH_working_final.Nationality))

59

In [None]:
#DF example 
RH_working_final.head()

Unnamed: 0,Nationality,DateAcquired,Decade,Count,Females,Males
137,German,2001,2000,7,1,6
157,Icelandic/American,2004,2000,1,0,1
158,Icelandic/American,2005,2000,1,0,1
159,Icelandic/American,2006,2000,1,0,1
160,Indian,2002,2000,1,0,1


In [None]:
#TO DOs:
# write down color patterns to keep them differentiated [need to do next step before to count the total colors needed] OK
# replot dataset any nationality that has count < 5 becomes 'Others' OK
# if we can do facets  we keep decades OK
# make them all barplots OK
# varible like an array the names of nats that are count < 5 to keeep them for later OK
#plots for both Galleries OK

#MOMA
fig11 = px.bar(MC_working_final, x="DateAcquired", y="Count", color="Nationality", color_discrete_sequence= momacolorscale,facet_col="Decade", facet_col_spacing=0.06,  facet_col_wrap=2, title="Artworks acquired by MoMA over decades split by nationality")
fig11.update_xaxes(matches=None, showticklabels=True)
fig11.update_yaxes(matches=None, showticklabels=True)
fig11.update_layout(height=1800)
fig11.show()

#RHZ
fig12 = px.bar(RH_working_final, x="DateAcquired", y="Count", color="Nationality",facet_col="Decade", color_discrete_sequence=rhzcolorscale, facet_col_spacing=0.06, facet_col_wrap=2, title="Artworks acquired by Rhizome over decades split by nationality")
fig12.update_xaxes(matches=None)
fig12.update_yaxes(matches=None, showticklabels=True)
fig12.update_layout(height=700)
fig12.show()

## 5a How does nationality representation within each dataset compare? 
-> across time OR total OR medium

Viz: 

1/moma pre v moma post w/ bar or column charts (maybe stacked bar chart) 

2/  rhizome v moma1983 w/ sampling (shrink down by 100%?) over time w/ line plot 


Story: ?? 



In [None]:
# create working df for MoMA datasets made for comparisons with rhizome create dictionary keeping saved the 'others' nationalities
#make count proportional dividing it by 10 OK

nationalities = ', '.join(moma_rhz_compare.Nationality)
nationalities = list(set(nationalities.split(', ')))
df_moma_rhz_compare = pd.DataFrame(columns= ["Nationality", "DateAcquired", "Count", "Females"]) 

OthersMomaComp = dict()
for nat in nationalities:
    nat_sub =  moma_rhz_compare[moma_rhz_compare.Nationality == nat]   
     
    nat_sub.DateAcquired = nat_sub.DateAcquired.astype('str')
    years = ', '.join(nat_sub.DateAcquired)
    years = sorted(list([item[:4] for item in list(set(years.split(', ')))]))[0:-1]
    for year in years:
        year_sub = nat_sub[nat_sub.DateAcquired == year]
        entries_year = len(year_sub)
        f_count = len(year_sub[year_sub.Gender == 'F'])
        missing_count = len(year_sub[year_sub.Gender == ''])
        if entries_year < 10:
            if year not in OthersMomaComp:
                OthersMomaComp[year] = set([nat])
            else: 
                new = OthersMomaComp[year]
                new.add(nat)
                OthersMomaComp[year] = set(new)
            nationality = 'others'
            
        else: 
            nationality = nat
        df_moma_rhz_compare.loc[len(df_moma_rhz_compare.index)] = [nationality, year, entries_year, f_count]
  

df_moma_rhz_compare.DateAcquired = df_moma_rhz_compare.DateAcquired.astype('int')
df_moma_rhz_compare['Decade'] = 'todo'
df_moma_rhz_compare["Decade"] = df_moma_rhz_compare["DateAcquired"].apply(lambda date: getDecade(date))
# replot dataset any nationality that has count < 5 becomes 'Others'
MComparison_working = df_moma_rhz_compare[df_moma_rhz_compare.DateAcquired > 1929]
MComparison_working.DateAcquired.astype('int')
MComparison_working_final = MComparison_working.groupby(['Nationality','DateAcquired', 'Decade'],as_index=False).agg({'Count': 'sum','Females': 'sum'})
MComparison_working_final = MComparison_working_final.sort_values(by='Decade')
MComparison_working_final["Count"] = MComparison_working_final["Count"].apply(lambda count: count.div(10).round(2))

px.imshow()




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

{'2000': {'Argentine',
  'Bahamian',
  'Belgian',
  'Brazilian',
  'Canadian',
  'Cuban',
  'Dutch',
  'Finnish',
  'Hungarian',
  'Icelandic',
  'Iranian',
  'Israeli',
  'Korean',
  'Pakistani',
  'South African',
  'Spanish',
  'Swiss',
  'Taiwanese',
  'Thai',
  'missing'},
 '2001': {'Austrian',
  'Bahamian',
  'Belgian',
  'Bosnian',
  'Canadian',
  'Cuban',
  'Danish',
  'Dutch',
  'French',
  'Hungarian',
  'Indian',
  'Italian',
  'Korean',
  'Mexican',
  'Nationality unknown',
  'Pakistani',
  'Polish',
  'Russian',
  'Scottish',
  'Spanish',
  'Swedish',
  'Swiss',
  'Turkish',
  'Ukrainian',
  'missing'},
 '2002': {'Argentine',
  'Austrian',
  'Belgian',
  'Brazilian',
  'Canadian',
  'Chinese',
  'Congolese',
  'Cuban',
  'Danish',
  'Dutch',
  'French',
  'Israeli',
  'Korean',
  'Mexican',
  'Polish',
  'South African',
  'Swedish',
  'missing'},
 '2003': {'Belgian',
  'Brazilian',
  'Canadian',
  'Chinese',
  'Cuban',
  'Dutch',
  'French',
  'Icelandic',
  'Israeli',
  

For comparison purposes the MoMa's dataset compared is a preprocessed one containing only artworks produced since 1983 and acquired from the museum since 2000. 

The counts have been cut down by 10% in order to male comparison even between thhe sets, as moma's for comparison is still 10 times biger than Rhizome's.

In [None]:
#create df with common nationalities for comparison 
nats_M = set(MComparison_working_final.Nationality)
nats_RH = set(df_rhz_nats.Nationality)
common_nats = nats_M & nats_RH

Moma_common_nats_df = pd.DataFrame()
Rh_common_nats_df = pd.DataFrame()
for item in common_nats:
    #moma
    moma = MComparison_working_final[MComparison_working_final.Nationality == item].drop('Females', axis = 1)
    Moma_common_nats_df = pd.concat([Moma_common_nats_df, moma])
    #rhizme
    rh = df_rhz_nats[df_rhz_nats.Nationality == item].drop('Females', axis = 1)
    Rh_common_nats_df = pd.concat([Rh_common_nats_df, rh])
    


Moma_common_nats_df = Moma_common_nats_df.rename(columns={'Count':'Moma'})
Rh_common_nats_df = Rh_common_nats_df.rename(columns={'Count':'Rhizome'})

workingDF = pd.concat([Moma_common_nats_df, Rh_common_nats_df], axis=0, ignore_index=True)
workingDF['Moma'] = workingDF['Moma'].fillna('0').astype('int')
workingDF['Rhizome'] = workingDF['Rhizome'].fillna('0').astype('int')
workingDF['DateAcquired'] = workingDF['DateAcquired'].astype('int')
workingDF = workingDF[workingDF.DateAcquired > 0]
workingDF_final = workingDF.groupby(['Nationality','DateAcquired'],as_index=False).agg({'Moma': 'sum','Rhizome': 'sum'})
workingDF_final = workingDF_final.sort_values(by='DateAcquired')

Unnamed: 0,Nationality,DateAcquired,Moma,Rhizome
0,American,2000,37,4
61,British,2000,1,1
80,Canadian,2000,0,1
132,French,2000,7,0
149,German,2000,1,0
...,...,...,...,...
172,Indian,2018,4,0
18,American,2018,33,0
94,Canadian,2019,1,0
208,Japanese,2019,1,0


### Plots: Line charts 
##### Variables: Nationality, Acquisition date, artworks aquired by MoMA, artworks aquired by Rhizome



In [None]:
#DF example
workingDF_final.head()

Unnamed: 0,Nationality,DateAcquired,Moma,Rhizome
0,American,2000,37,4
61,British,2000,1,1
80,Canadian,2000,0,1
132,French,2000,7,0
149,German,2000,1,0


### Comparison acquisition of artworks by nationality -- Moma after 1980 vs. Rhizome Collection by decades

In [None]:
#create subplots for nationalities and their figuresàuse them in the story as pictograms, 
# see if artists are related and subgroup nationalities in Global-noth (colonizers) and gglobal south (colonised pops) 
# and is also interesting to look at exact matches


fig = px.line(workingDF_final, x="DateAcquired", y=['Moma', 'Rhizome'], facet_col='Nationality', facet_col_wrap=2,
              facet_row_spacing=0.02, # default is 0.07 when facet_col_wrap is used
              facet_col_spacing=0.04, # default is 0.03
              height=2800, width=1100,
              markers=True,
              color_discrete_map={"Count Moma": "#456987","Count Rhizome": "#147852"},
              title="Comparision of acquisitions over the year between nationalities, both in MoMa and Rhizome's data sets")

fig.update_xaxes(matches=None, showticklabels=True)
fig.update_yaxes(matches=None, showticklabels=True)
fig.show()

Find the artists in the original dfs and see if if there are some interesting matches 

## 5b How does nationality representation relate to gender? Rhizome vs MoMA 1983+ / MoMA pre-83 v MoMA post-83

--> gender proportions for each nationality across time OR total OR medium 

Viz: bubble charts (Chiara) w/ male / female / collectives across time + total  

Story: maybe similar to 4? 


### Plots: Columns stacked bar charts 
##### Variables: Acquisition date, Nationality, Gender, Artworks for Gender 

In [None]:
#DF example 
RH_working_final.head()

Unnamed: 0,Nationality,DateAcquired,Decade,Count,Females,Males
137,German,2001,2000,7,1,6
157,Icelandic/American,2004,2000,1,0,1
158,Icelandic/American,2005,2000,1,0,1
159,Icelandic/American,2006,2000,1,0,1
160,Indian,2002,2000,1,0,1


In [None]:
fig11 = px.bar(RH_working_final, x="DateAcquired", y="Count", color="   12", color_discrete_sequence= momacolorscale,facet_col="Decade", facet_col_spacing=0.06,  facet_col_wrap=2, title="Artworks acquired by MoMA over decades split by nationality")
fig11.update_xaxes(matches=None, showticklabels=True)
fig11.update_yaxes(matches=None, showticklabels=True)
fig11.update_layout(height=1800)
fig11.show()


### Plots: Bubble charts charts 
##### Variables: Acquisition date, Nationality, Gender, Artworks for Gender 

Make the same scatters as Marghe did but with nationalities + scattered plot below in Q7 try with 18 commons or 10

In [None]:
fig = px.scatter(group_counts, y="Gender", x="DateAcquired", color="DbProv", size="sizeXgroup", facet_col='Source')

fig.update_xaxes(tick0=1980)

fig.show()

### Moma old

In [None]:
# create subset for females 
old = moma_artworks_old.copy()

nationalities = ', '.join(old.Nationality)
nationalities = list(set(nationalities.split(', ')))
df_MO_nats_F = pd.DataFrame(columns= ["Nationality", "DateAcquired", "Count", "Gender"]) 

for nat in nationalities:
    nat_sub =  old[old.Nationality == nat]
    sub_f = nat_sub[nat_sub.Gender == 'F']   
    sub_f.DateAcquired = sub_f.DateAcquired.astype('str')
    years = ', '.join(sub_f.DateAcquired)
    years = sorted(list([item[:4] for item in list(set(years.split(', ')))]))[0:-1]
    for year in years:
        year_sub = sub_f[sub_f.DateAcquired == year]
        f_count = len(year_sub[year_sub.Gender == 'F'])
        df_MO_nats_F.loc[len(df_MO_nats_F.index)] = [nat, year, f_count, 'F']

df_MO_nats_F.DateAcquired = df_MO_nats_F.DateAcquired.astype('int')
df_MO_nats_F = df_MO_nats_F[df_MO_nats_F.DateAcquired >0]



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

In [None]:
#create subset for males 
df_MO_nats_M = pd.DataFrame(columns= ["Nationality", "DateAcquired", "Count", "Gender"]) 

for nat in nationalities:
    nat_sub =  old[old.Nationality == nat]
    sub_m = nat_sub[nat_sub.Gender == 'M']   
    sub_m.DateAcquired = sub_m.DateAcquired.astype('str')
    years = ', '.join(sub_m.DateAcquired)
    years = sorted(list([item[:4] for item in list(set(years.split(', ')))]))[0:-1]
    for year in years:
        year_sub = sub_m[sub_m.DateAcquired == year]
        m_count = len(year_sub[year_sub.Gender == 'M'])
        df_MO_nats_M.loc[len(df_MO_nats_M.index)] = [nat, year, m_count, 'M']

df_MO_nats_M.DateAcquired = df_MO_nats_M.DateAcquired.astype('int')
df_MO_nats_M = df_MO_nats_M[df_MO_nats_M.DateAcquired >0]



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

In [None]:
Moma_old_MFnat

Unnamed: 0,Nationality,DateAcquired,Count,Gender
0,Italian,1944,1,F
1,Italian,1949,3,F
2,Italian,1959,2,F
3,Italian,1963,1,F
4,Italian,1964,9,F
...,...,...,...,...
1813,Guatemalan,1959,1,M
1814,Guatemalan,1961,1,M
1815,Guatemalan,1964,2,M
1816,Guatemalan,1966,1,M


In [None]:
#plot bubble charts with acquisition over decades for moma old
Moma_old_MFnat = pd.concat([df_MO_nats_F,df_MO_nats_M])
df =Moma_old_MFnat[(Moma_old_MFnat['DateAcquired'] >= 2000) & (Moma_old_MFnat['DateAcquired']) <= 2010 & (Moma_old_MFnat['Count'] >10)]


fig41 = px.scatter(df,
                 x="Nationality", y="DateAcquired", size="Count", color="Gender" ,
                 opacity = 0.3,
                 title="MoMA's created before 1980",

             color_discrete_sequence=['red', 'blue'],
                width= 900,
                 height= 700)

fig41.update_layout(
    paper_bgcolor='rgb(255, 255, 255)',
    plot_bgcolor='rgb(243, 243, 243)',
    )

fig41.show()
# cut down nats < 10 that become other 
# try duble columns stacked barplot instead  


### Moma new

In [None]:
# create subset for females 
new = moma_artworks_new.copy()

nationalities = ', '.join(new.Nationality)
nationalities = list(set(nationalities.split(', ')))
df_MN_nats_F = pd.DataFrame(columns= ["Nationality", "DateAcquired", "Count", "Gender"]) 

for nat in nationalities:
    nat_sub =  new[new.Nationality == nat]
    sub_f = nat_sub[nat_sub.Gender == 'F']   
    sub_f.DateAcquired = sub_f.DateAcquired.astype('str')
    years = ', '.join(sub_f.DateAcquired)
    years = sorted(list([item[:4] for item in list(set(years.split(', ')))]))[0:-1]
    for year in years:
        year_sub = sub_f[sub_f.DateAcquired == year]
        f_count = len(year_sub[year_sub.Gender == 'F'])
        df_MN_nats_F.loc[len(df_MN_nats_F.index)] = [nat, year, f_count, 'F']

df_MN_nats_F.DateAcquired = df_MN_nats_F.DateAcquired.astype('int')
df_MN_nats_F = df_MN_nats_F[df_MN_nats_F.DateAcquired >0]



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

In [None]:
#create subset for males 
df_MN_nats_M = pd.DataFrame(columns= ["Nationality", "DateAcquired", "Count", "Gender"]) 

for nat in nationalities:
    nat_sub =  old[old.Nationality == nat]
    sub_m = nat_sub[nat_sub.Gender == 'M']   
    sub_m.DateAcquired = sub_m.DateAcquired.astype('str')
    years = ', '.join(sub_m.DateAcquired)
    years = sorted(list([item[:4] for item in list(set(years.split(', ')))]))[0:-1]
    for year in years:
        year_sub = sub_m[sub_m.DateAcquired == year]
        m_count = len(year_sub[year_sub.Gender == 'M'])
        df_MN_nats_M.loc[len(df_MN_nats_M.index)] = [nat, year, m_count, 'M']

df_MN_nats_M.DateAcquired = df_MN_nats_M.DateAcquired.astype('int')
df_MN_nats_M = df_MN_nats_M[df_MN_nats_M.DateAcquired >0]



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

In [None]:
#plot bubble charts with acquisition over decades for moma old
Moma_new_MFnat = pd.concat([df_MN_nats_F,df_MN_nats_M])
df =Moma_new_MFnat[(Moma_new_MFnat['DateAcquired'] >= 2000) & (Moma_new_MFnat['DateAcquired'] <= 2010)]


fig42 = px.scatter(df,
                 x="DateAcquired", y="Nationality", size="Count", color="Gender" ,
                 opacity = 0.3,
                 title="MoMA's created after 1980",

             color_discrete_sequence=['red', 'blue'],
                 width= 900,
                 height= 700)

fig42.update_layout(
    paper_bgcolor='rgb(255, 255, 255)',
    plot_bgcolor='rgb(243, 243, 243)',
    )

fig42.show()

### Moma complete

In [None]:
# create subset for females 
complete = pd.concat([old,new])
nationalities = ', '.join(complete.Nationality)
nationalities = list(set(nationalities.split(', ')))
df_MC_nats_F = pd.DataFrame(columns= ["Nationality", "DateAcquired", "Count", "Gender"]) 

for nat in nationalities:
    nat_sub =  complete[complete.Nationality == nat]
    sub_f = nat_sub[nat_sub.Gender == 'F']   
    sub_f.DateAcquired = sub_f.DateAcquired.astype('str')
    years = ', '.join(sub_f.DateAcquired)
    years = sorted(list([item[:4] for item in list(set(years.split(', ')))]))[0:-1]
    for year in years:
        year_sub = sub_f[sub_f.DateAcquired == year]
        f_count = len(year_sub[year_sub.Gender == 'F'])
        df_MC_nats_F.loc[len(df_MC_nats_F.index)] = [nat, year, f_count, 'F']

df_MC_nats_F.DateAcquired = df_MC_nats_F.DateAcquired.astype('int')
df_MC_nats_F = df_MC_nats_F[df_MC_nats_F.DateAcquired >0]



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

In [None]:
#create subset for males 
df_MC_nats_M = pd.DataFrame(columns= ["Nationality", "DateAcquired", "Count", "Gender"]) 

for nat in nationalities:
    nat_sub =  complete[complete.Nationality == nat]
    sub_m = nat_sub[nat_sub.Gender == 'M']   
    sub_m.DateAcquired = sub_m.DateAcquired.astype('str')
    years = ', '.join(sub_m.DateAcquired)
    years = sorted(list([item[:4] for item in list(set(years.split(', ')))]))[0:-1]
    for year in years:
        year_sub = sub_m[sub_m.DateAcquired == year]
        m_count = len(year_sub[year_sub.Gender == 'M'])
        df_MC_nats_M.loc[len(df_MC_nats_M.index)] = [nat, year, m_count, 'M']

df_MC_nats_M.DateAcquired = df_MC_nats_M.DateAcquired.astype('int')
df_MC_nats_M = df_MC_nats_M[df_MC_nats_M.DateAcquired >0]



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

In [None]:
#plot bubble charts with acquisition over decades for moma complete
Moma_complet_MFnat = pd.concat([df_MC_nats_F,df_MC_nats_M])
df =Moma_complet_MFnat[(Moma_complet_MFnat['DateAcquired'] >= 2000) & (Moma_complet_MFnat['DateAcquired'] <= 2010)]


fig43 = px.scatter(df,
                 x="DateAcquired", y="Nationality", size="Count", color="Gender" ,
                 opacity = 0.3,
                 title="MoMA's complete",

             color_discrete_sequence=['red', 'blue'],
                 width= 900,
                 height= 700)

fig43.update_layout(
    paper_bgcolor='rgb(255, 255, 255)',
    plot_bgcolor='rgb(243, 243, 243)',
    )

fig43.show()

### Rhizome

In [None]:
df_R_nats_F

Unnamed: 0,Nationality,dateAcquired,Count,Gender
0,Italian,2002,3,F
1,Italian,2005,5,F
2,Italian,2007,1,F
3,Dutch,2001,1,F
4,Dutch,2003,1,F
5,Dutch,2004,7,F
6,Dutch,2005,4,F
7,Dutch,2006,4,F
8,Dutch,2007,1,F
9,Dutch,2008,2,F


In [None]:
# create subset for females 
rh = rhz_artworks.copy()

nationalities = ', '.join(rh.Nationality)
nationalities = list(set(nationalities.split(', ')))
df_R_nats_F = pd.DataFrame(columns= ["Nationality", "dateAcquired", "Count", "Gender"]) 

for nat in nationalities:
    nat_sub =  rh[rh.Nationality == nat]
    sub_f = nat_sub[nat_sub.Gender == 'F']   
    sub_f.dateAcquired = sub_f.dateAcquired.astype('str')
    years = ', '.join(sub_f.dateAcquired)
    years = sorted(list([item[:4] for item in list(set(years.split(', ')))]))[0:-1]
    for year in years:
        year_sub = sub_f[sub_f.dateAcquired == year]
        f_count = len(year_sub[year_sub.Gender == 'F'])
        df_R_nats_F.loc[len(df_R_nats_F.index)] = [nat, year, f_count, 'F']

df_R_nats_F.dateAcquired = df_R_nats_F.dateAcquired.astype('int')
df_R_nats_F = df_R_nats_F[df_R_nats_F.dateAcquired >0]



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

In [None]:
#create subset for males 
df_R_nats_M = pd.DataFrame(columns= ["Nationality", "dateAcquired", "Count", "Gender"]) 

for nat in nationalities:
    nat_sub =  rh[rh.Nationality == nat]
    sub_m = nat_sub[nat_sub.Gender == 'M']   
    sub_m.dateAcquired = sub_m.dateAcquired.astype('str')
    years = ', '.join(sub_m.dateAcquired)
    years = sorted(list([item[:4] for item in list(set(years.split(', ')))]))[0:-1]
    for year in years:
        year_sub = sub_m[sub_m.dateAcquired == year]
        m_count = len(year_sub[year_sub.Gender == 'M'])
        df_R_nats_M.loc[len(df_R_nats_M.index)] = [nat, year, m_count, 'M']

df_R_nats_M.dateAcquired = df_R_nats_M.dateAcquired.astype('int')
df_R_nats_M = df_R_nats_M[df_R_nats_M.dateAcquired >0]



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

In [None]:
#plot bubble charts with acquisizion over decades for moma old
Rhizome_MFnat = pd.concat([df_R_nats_F,df_R_nats_M])
Rhizome_MFnat['dateAcquired']=Rhizome_MFnat['dateAcquired'].fillna(0).astype(int)
Rhizome_MFnat
df =Rhizome_MFnat[(Rhizome_MFnat['dateAcquired'] >= 2000) & (Rhizome_MFnat['dateAcquired'] <= 2010)]


fig43 = px.scatter(df,
                 x="dateAcquired", y="Nationality", size="Count", color="Gender" ,
                 opacity = 0.3,
                 title="Rhizome's created after 1980",

             color_discrete_sequence=['red', 'blue'],
                 width= 900,
                 height= 700)

fig43.update_layout(
    paper_bgcolor='rgb(255, 255, 255)',
    plot_bgcolor='rgb(243, 243, 243)',
    )

fig43.show()

## 6 What are the most prominent medium in each dataset by gender and/or nationality? Rhizome vs MoMA 1983+ / MoMA pre-83 v MoMA post-83

--> simple count OR percentage of total  

-> how do these compare? 

-> compare only Rhizome vs MoMA media & performance / film 

Viz: pie charts/pictograms for counts OR parallel set for m/f <-> medium 
Story: TBC 



### Plots: Pie charts 
##### Variables: Nationality, Medium, count for Department

In [None]:
def search_nat(id):
    subset = moma_artists[moma_artists.ID == id]
    if len(subset)>=1:
        nat = subset['Nationality'].values[0]
        return nat
    else:
        return 'missing'

In [None]:
#create dfs for departments
ArchitectureDesign = moma_artworks[moma_artworks.Department == 'Architecture & Design']
ArchitectureDesign.ConstituentID = ArchitectureDesign.ConstituentID.fillna('0')
ArchitectureDesign['Nationality'] = ArchitectureDesign["ConstituentID"].apply(lambda id: search_nat(id))

Architecture_Design_Images = moma_artworks[moma_artworks.Department == 'Architecture & Design - Image Archive']
Architecture_Design_Images.ConstituentID = Architecture_Design_Images.ConstituentID.fillna('0')
Architecture_Design_Images['Nationality'] = Architecture_Design_Images["ConstituentID"].apply(lambda id: search_nat(id))

Drawings_Prints = moma_artworks[moma_artworks.Department == 'Drawings & Prints']
Drawings_Prints.ConstituentID = Drawings_Prints.ConstituentID.fillna('0')
Drawings_Prints['Nationality'] = Drawings_Prints["ConstituentID"].apply(lambda id: search_nat(id))

Film = moma_artworks[moma_artworks.Department == 'Film']
Film.ConstituentID = Film.ConstituentID.fillna('0')
Film['Nationality'] = Film["ConstituentID"].apply(lambda id: search_nat(id))

Fluxus_Collection = moma_artworks[moma_artworks.Department == 'Fluxus Collection']
Fluxus_Collection.ConstituentID = Fluxus_Collection.ConstituentID.fillna('0')
Fluxus_Collection['Nationality'] = Fluxus_Collection["ConstituentID"].apply(lambda id: search_nat(id))

Media_Performance = moma_artworks[moma_artworks.Department == 'Media and Performance']
Media_Performance.ConstituentID = Media_Performance.ConstituentID.fillna('0')
Media_Performance['Nationality'] = Media_Performance["ConstituentID"].apply(lambda id: search_nat(id))

Painting_Sculpture = moma_artworks[moma_artworks.Department == 'Painting & Sculpture']
Painting_Sculpture.ConstituentID = Painting_Sculpture.ConstituentID.fillna('0')
Painting_Sculpture['Nationality'] = Painting_Sculpture["ConstituentID"].apply(lambda id: search_nat(id))

Photography = moma_artworks[moma_artworks.Department == 'Photography']
Photography.ConstituentID = Photography.ConstituentID.fillna('0')
Photography['Nationality'] = Photography["ConstituentID"].apply(lambda id: search_nat(id))




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

### Moma complete Departments

relationship media department nationalities

In [None]:
#plot piechart for count of nationalities 
set(Media_Performance.Nationality)


{'Afghan',
 'Albanian',
 'Algerian',
 'American',
 'Argentine',
 'Australian',
 'Austrian',
 'Bangladeshi',
 'Belgian',
 'Brazilian',
 'British',
 'Canadian',
 'Chilean',
 'Chinese',
 'Colombian',
 'Croatian',
 'Cuban',
 'Czech',
 'Danish',
 'Dutch',
 'Egyptian',
 'Finnish',
 'French',
 'Georgian',
 'German',
 'Guatemalan',
 'Hungarian',
 'Icelandic',
 'Indian',
 'Iranian',
 'Irish',
 'Israeli',
 'Italian',
 'Japanese',
 'Korean',
 'Lebanese',
 'Lithuanian',
 'Mexican',
 'Moroccan',
 'Native American',
 'New Zealander',
 'Pakistani',
 'Palestinian',
 'Peruvian',
 'Polish',
 'Romanian',
 'Russian',
 'Scottish',
 'Singaporean',
 'Slovak',
 'South African',
 'Spanish',
 'Swedish',
 'Swiss',
 'Taiwanese',
 'Turkish',
 'Ukrainian',
 'Uruguayan',
 'Yugoslav',
 'missing'}

# Plotly Visualizations for gender and nationality comparison

In [None]:
go.Heatmap()