# Initial research questions

* comparative analysis of gender representation in artwork creation between born digital and analogue art collections
* with also, potentially, some details on medium, location…
* semantic analysis of the narrative about the artworks or what are the keywords associated with different artwork types

### The story could be:
The internet was supposed to revolutionize things, so how did it do when looking at who makes art and who gets included in collections?


A simple way to plan your work is:

 * choose the research question
 * map the question to pieces of information needed to answer the question (e.g. periods, countings)
 * map the data to specific data types (categorical, numerical, ordinal)
 * choose the plot(s) that better help you to visualise some pattern (e.g. a bar chart)
 * get your data in some form (SPARQL query results)
 * filter/ manipulate your data (select the variables that matter, make operations like countings) 
 * create a data structure that fits the plotting requirements (a table, a JSON etc) including the number of variables needed (e.g. one categorical and one numerical)


In [1]:
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
from plotly.subplots import make_subplots
import pycountry
import numpy as np

In [2]:
#IMPORT DATASETS
artists = pd.read_pickle('MOMA_data/pickle/MoMAartists.pkl')
before80 = pd.read_pickle('MOMA_data/pickle/old_artworks.pkl')
after80 = pd.read_pickle('MOMA_data/pickle/new_artworks.pkl')
rhizome = pd.read_pickle('Rhizome_data/rhizome_artworks_extra.pkl')

# Plotly Visualizations for gender and nationality comparison

## 1 Acquisition overview of Moma and Rhizome

Count of how many artworks for each country have been added to the MoMA (new and old) and Rhizome collections

### MoMA's created before 1980

In [3]:
before80

Unnamed: 0,Title,Artist,ID,DateCreated,Medium,Department,DateAcquired,URL,ThumbnailURL,Nationality,Gender
0,"Ferdinandsbrücke Project, Vienna, Austria (Ele...",Otto Wagner,6210,1896,Ink and cut-and-pasted painted pages on paper,Architecture & Design,1996,http://www.moma.org/collection/works/2,http://www.moma.org/media/W1siZiIsIjU5NDA1Il0s...,Austrian,M
2,"Villa near Vienna Project, Outside Vienna, Aus...",Emil Hoppe,7605,1903,"Graphite, pen, color pencil, ink, and gouache ...",Architecture & Design,1997,http://www.moma.org/collection/works/4,http://www.moma.org/media/W1siZiIsIjk4Il0sWyJw...,Austrian,M
4,"Villa, project, outside Vienna, Austria, Exter...",Emil Hoppe,7605,1903,"Graphite, color pencil, ink, and gouache on tr...",Architecture & Design,1997,http://www.moma.org/collection/works/6,http://www.moma.org/media/W1siZiIsIjEyNiJdLFsi...,Austrian,M
5,"The Manhattan Transcripts Project, New York, N...",Bernard Tschumi,7056,1976,Gelatin silver photograph,Architecture & Design,1995,http://www.moma.org/collection/works/7,http://www.moma.org/media/W1siZiIsIjE0OCJdLFsi...,missing,M
6,"The Manhattan Transcripts Project, New York, N...",Bernard Tschumi,7056,1976,Gelatin silver photographs,Architecture & Design,1995,http://www.moma.org/collection/works/8,http://www.moma.org/media/W1siZiIsIjE0OSJdLFsi...,missing,M
...,...,...,...,...,...,...,...,...,...,...,...
138146,Untitled,"Chesnutt Brothers Studio, Andrew Chesnutt, Lew...","133005, 133006, 133007",1890,Gelatin silver print,Photography,2020,http://www.moma.org/collection/works/418928,http://www.moma.org/media/W1siZiIsIjQ5MjcyMiJd...,"missing, American, American","missing, M, M"
138147,Plate (folio 2 verso) from Muscheln und schirm...,Sophie Taeuber-Arp,5777,1939,One from an illustrated book with four line bl...,Drawings & Prints,2019,http://www.moma.org/collection/works/419286,http://www.moma.org/media/W1siZiIsIjQ4NTExNSJd...,Swiss,F
138148,Plate (folio 6) from Muscheln und schirme (She...,Sophie Taeuber-Arp,5777,1939,One from an illustrated book with four line bl...,Drawings & Prints,2019,http://www.moma.org/collection/works/419287,http://www.moma.org/media/W1siZiIsIjQ4NTExOCJd...,Swiss,F
138149,Plate (folio 12) from Muscheln und schirme (Sh...,Sophie Taeuber-Arp,5777,1939,One from an illustrated book with four line bl...,Drawings & Prints,2019,http://www.moma.org/collection/works/419288,http://www.moma.org/media/W1siZiIsIjQ4NTEyMCJd...,Swiss,F


In [4]:

before80.Nationality = before80.Nationality.astype('str')
nats = ', '.join(before80.Nationality)
nats = list(set(nats.split(', ')))
df_nats_before = pd.DataFrame()

for country in nats:
    nat_sub =  before80[before80.Nationality == country]    
    nat_sub.DateAcquired = nat_sub.DateAcquired.astype('str')
    years = ', '.join(nat_sub.DateAcquired)
    years = sorted(list([item[:4] for item in list(set(years.split(', ')))]))[0:-1]
    for year in years:
        year_sub = nat_sub[nat_sub.DateAcquired == year]
        entries_year = len(year_sub)
        f_count = len(year_sub[year_sub.Gender == 'F'])
        new_row = pd.DataFrame([[country, year, entries_year, f_count]])
        df_nats_before = pd.concat([df_nats_before, new_row], axis=0, ignore_index=True)
df_nats_before.columns= ["Nation", "DateAcquired", "Count", "Females"]
df_nats_before.DateAcquired = df_nats_before.DateAcquired.astype('int')
df_nats_before    

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  nat_sub.DateAcquired = nat_sub.DateAcquired.astype('str')


Unnamed: 0,Nation,DateAcquired,Count,Females
0,Argentine,1942,27,7
1,Argentine,1943,1,0
2,Argentine,1954,29,0
3,Argentine,1956,2,0
4,Argentine,1957,2,1
...,...,...,...,...
2021,Croatian,2016,1,0
2022,Croatian,2018,2,0
2023,Croatian,2019,2,0
2024,Sudanese,1965,1,0


In [5]:
import plotly.express as px
fig1 = px.area(df_nats_before, x="DateAcquired", y="Count", color="Nation", line_group="Nation")
fig1.update_layout(
    xaxis = dict(
                
        tickmode = 'linear',

        dtick = 10
    ),
    paper_bgcolor='rgb(243, 243, 243)',
    plot_bgcolor='rgb(243, 243, 243)',
)



   
fig1.show()

### MoMA's created after 1980

In [6]:
after80

Unnamed: 0,Title,Artist,ID,DateCreated,Medium,Department,DateAcquired,URL,ThumbnailURL,Nationality,Gender
1,"City of Music, National Superior Conservatory ...",Christian de Portzamparc,7470,1987,Paint and colored pencil on print,Architecture & Design,1995,http://www.moma.org/collection/works/3,http://www.moma.org/media/W1siZiIsIjk3Il0sWyJw...,French,M
3,"The Manhattan Transcripts Project, New York, N...",Bernard Tschumi,7056,1980,Photographic reproduction with colored synthet...,Architecture & Design,1995,http://www.moma.org/collection/works/5,http://www.moma.org/media/W1siZiIsIjEyNCJdLFsi...,missing,M
31,"The Manhattan Transcripts Project, New York, N...",Bernard Tschumi,7056,1980,Photographic reproduction with colored synthet...,Architecture & Design,1995,http://www.moma.org/collection/works/33,http://www.moma.org/media/W1siZiIsIjIwMCJdLFsi...,missing,M
35,"The Manhattan Transcripts Project, New York, N...",Bernard Tschumi,7056,1980,Photographic reproduction with colored synthet...,Architecture & Design,1995,http://www.moma.org/collection/works/38,http://www.moma.org/media/W1siZiIsIjI2NyJdLFsi...,missing,M
40,"The Manhattan Transcripts Project, New York, N...",Bernard Tschumi,7056,1980,Ink on tracing paper,Architecture & Design,1995,http://www.moma.org/collection/works/44,http://www.moma.org/media/W1siZiIsIjI5NiJdLFsi...,missing,M
...,...,...,...,...,...,...,...,...,...,...,...
138114,Cóctel (Cocktail),Alejandro Kuropatwa,132939,1996,Chromogenic color print,Photography,2020,missing,missing,Argentine,missing
138115,Cóctel (Cocktail),Alejandro Kuropatwa,132939,1996,Chromogenic color print,Photography,2020,missing,missing,Argentine,missing
138116,Cóctel (Cocktail),Alejandro Kuropatwa,132939,1996,Chromogenic color print,Photography,2020,missing,missing,Argentine,missing
138117,Cóctel (Cocktail),Alejandro Kuropatwa,132939,1996,Chromogenic color print,Photography,2020,missing,missing,Argentine,missing


In [7]:
after80.Nationality = after80.Nationality.astype('str')
nats = ', '.join(after80.Nationality)
nats = list(set(nats.split(', ')))
df_nats_after = pd.DataFrame()

for country in nats:
    nat_sub =  after80[after80.Nationality == country]    
    nat_sub.DateAcquired = nat_sub.DateAcquired.astype('str')
    years = ', '.join(nat_sub.DateAcquired)
    years = sorted(list([item[:4] for item in list(set(years.split(', ')))]))[0:-1]
    for year in years:
        year_sub = nat_sub[nat_sub.DateAcquired == year]
        entries_year = len(year_sub)
        f_count = len(year_sub[year_sub.Gender == 'F'])
        new_row = pd.DataFrame([[country, year, entries_year, f_count]])
        df_nats_after = pd.concat([df_nats_after, new_row], axis=0, ignore_index=True)

df_nats_after.columns= ["Nation", "DateAcquired", "Count", "Females"]

df_nats_after.DateAcquired = df_nats_after.DateAcquired.astype('int')

df_nats_after   



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,Nation,DateAcquired,Count,Females
0,Argentine,1986,3,0
1,Argentine,1987,1,0
2,Argentine,1992,1,1
3,Argentine,1993,1,0
4,Argentine,1994,1,0
...,...,...,...,...
1003,Croatian,2014,1,0
1004,Croatian,2015,1,0
1005,Emirati,2019,5,5
1006,Bahamian,2000,1,1


In [8]:
import plotly.express as px
fig3 = px.area(df_nats_after, x="DateAcquired", y="Count", color="Nation", line_group="Nation")
fig3.update_layout(
    xaxis = dict(

        tickmode = 'linear',
        tick0 = 1980,
        dtick = 2
    ),
    paper_bgcolor='rgb(243, 243, 243)',
    plot_bgcolor='rgb(243, 243, 243)',
)

   
fig3.show()


### Rhizome's 

In [9]:
rhizome

Unnamed: 0,ID,URL,Title,Artist,dateAcquired,dateCreated,Nationality,Gender
0,879,https://artbase.rhizome.org/wiki/Q2423,ZUR FARBENLEHRE (THEORY OF COLOURS),Steven Jones,2007,2007,British,M
1,1020,https://artbase.rhizome.org/wiki/Q4089,Zones de Convergence,cicero,2005,2005,missing,missing
2,"243, 701",https://artbase.rhizome.org/wiki/Q1475,Zombie and Mummy,"Dragan Espenschied, Olia Lialina",2004,2002,"German, Russian","M, F"
3,312,https://artbase.rhizome.org/wiki/Q4374,"Zaira, City of Memories",Gokcen Erguven,2004,2004,Turkish,F
4,920,https://artbase.rhizome.org/wiki/Q3972,Z_G [zeitgeist gestalten],Tiago Borges,2008,2007,Angolan,M
...,...,...,...,...,...,...,...,...
2265,1075,https://artbase.rhizome.org/wiki/Q4358,1999,joan escofet,2001,2000,missing,missing
2266,771,https://artbase.rhizome.org/wiki/Q3761,1969,Rhea Myers,2004,2004,British,F
2267,859,https://artbase.rhizome.org/wiki/Q2283,1953,Skye Thorstenson,2003,2002,missing,M
2268,481,https://artbase.rhizome.org/wiki/Q2511,160,Katie Lips,2005,2005,British,F


In [10]:

rhizome.Nationality = rhizome.Nationality.astype('str')
nats = ', '.join(rhizome.Nationality)
nats = list(set(nats.split(', ')))
df_nats_rhizome = pd.DataFrame()

for country in nats:
    nat_sub =  rhizome[rhizome.Nationality == country]    
    nat_sub.DateAcquired = nat_sub.dateAcquired.astype('str')
    years = ', '.join(nat_sub.DateAcquired)
    years = sorted(list([item[:4] for item in list(set(years.split(', ')))]))[0:-1]
    for year in years:
        year_sub = nat_sub[nat_sub.DateAcquired == year]
        entries_year = len(year_sub)
        f_count = len(year_sub[year_sub.Gender == 'F'])
        new_row = pd.DataFrame([[country, year, entries_year, f_count]])
        df_nats_rhizome = pd.concat([df_nats_rhizome, new_row], axis=0, ignore_index=True)

df_nats_rhizome.columns= ["Nation", "DateAcquired", "Count", "Females"]
df_nats_rhizome    


Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access



Unnamed: 0,Nation,DateAcquired,Count,Females
0,Italian,2001,1,0
1,Argentine,2001,2,0
2,Argentine,2002,2,1
3,Argentine,2003,2,0
4,Argentine,2004,6,4
...,...,...,...,...
281,Israeli,2006,1,0
282,Israeli,2007,1,1
283,Israeli,2008,3,0
284,Croatian,2003,1,1


In [11]:
df_nats_before.sort_values( by=["Count"], ascending=True)


Unnamed: 0,Nation,DateAcquired,Count,Females
1012,Dutch,2003,1,0
713,Danish,1963,1,0
1847,Hungarian,2009,1,0
717,Danish,1967,1,0
720,Danish,1970,1,0
...,...,...,...,...
526,American,1969,1558,154
565,American,2008,2542,200
531,American,1974,2616,441
1104,French,1968,5225,3


In [12]:
fig4 = px.area(df_nats_rhizome, x="DateAcquired", y="Count", color="Nation", line_group="Nation")
fig4.update_layout(
     xaxis = dict(
        tick0 = 2001,
        dtick = 2
    ),
    paper_bgcolor='rgb(243, 243, 243)',
    plot_bgcolor='rgb(243, 243, 243)',
   
)
fig4.show()

## Acquisition of artworks by female over total -- of Moma and Rhizome

Count of how many artworks for each country have been added to the MoMA (new and old) and Rhizome collections, compared with the number of artworks acquired made by only a female artist. 

Visualization is on categorical data (gender, nationality), ordinal data (acquisition year) and numerical data (tot artworks per nationality  count, artworks by a female artist per nationality)

In [13]:
import plotly.io as pio
import plotly.express as px
df =df_nats_before[(df_nats_before['DateAcquired'] >= 1970) & (df_nats_before['DateAcquired'] <= 1980)]
df2 =df_nats_before.loc[df_nats_before['Count'] > 100]

fig5 = px.scatter(df2,
                 x="Count", y="DateAcquired", size="Females", color="Nation",
                 log_x=True,
                 title="MoMA's created before 1980")
# fig5.update_traces(
#     tickformatstops = {
#         'dtickrange': '[100,10000]'
#     }
# )
fig5.update_layout(
    paper_bgcolor='rgb(255, 255, 255)',
    plot_bgcolor='rgb(243, 243, 243)',
    )


fig5.show()

Add a shape for males 

In [14]:
fig6 = px.scatter(df_nats_after,
                 x="Count", y="DateAcquired", size="Females", color="Nation",
                 log_x=True, size_max=30,
                 title="MoMA's created after 1980")
fig6.show()

In [15]:
fig7 = px.scatter(df_nats_rhizome,
                 x="DateAcquired", y="Count", size="Females", color="Nation",
                 log_x=True, size_max=30,
                 title="Rhizome's")
fig7.show()

In [16]:
before80['Source'] = 'Old'
after80['Source'] = 'New'
MoMA_complete = pd.concat([before80, after80], axis=0, ignore_index=True)
MoMA_complete

Unnamed: 0,Title,Artist,ID,DateCreated,Medium,Department,DateAcquired,URL,ThumbnailURL,Nationality,Gender,Source
0,"Ferdinandsbrücke Project, Vienna, Austria (Ele...",Otto Wagner,6210,1896,Ink and cut-and-pasted painted pages on paper,Architecture & Design,1996,http://www.moma.org/collection/works/2,http://www.moma.org/media/W1siZiIsIjU5NDA1Il0s...,Austrian,M,Old
1,"Villa near Vienna Project, Outside Vienna, Aus...",Emil Hoppe,7605,1903,"Graphite, pen, color pencil, ink, and gouache ...",Architecture & Design,1997,http://www.moma.org/collection/works/4,http://www.moma.org/media/W1siZiIsIjk4Il0sWyJw...,Austrian,M,Old
2,"Villa, project, outside Vienna, Austria, Exter...",Emil Hoppe,7605,1903,"Graphite, color pencil, ink, and gouache on tr...",Architecture & Design,1997,http://www.moma.org/collection/works/6,http://www.moma.org/media/W1siZiIsIjEyNiJdLFsi...,Austrian,M,Old
3,"The Manhattan Transcripts Project, New York, N...",Bernard Tschumi,7056,1976,Gelatin silver photograph,Architecture & Design,1995,http://www.moma.org/collection/works/7,http://www.moma.org/media/W1siZiIsIjE0OCJdLFsi...,missing,M,Old
4,"The Manhattan Transcripts Project, New York, N...",Bernard Tschumi,7056,1976,Gelatin silver photographs,Architecture & Design,1995,http://www.moma.org/collection/works/8,http://www.moma.org/media/W1siZiIsIjE0OSJdLFsi...,missing,M,Old
...,...,...,...,...,...,...,...,...,...,...,...,...
138146,Cóctel (Cocktail),Alejandro Kuropatwa,132939,1996,Chromogenic color print,Photography,2020,missing,missing,Argentine,missing,New
138147,Cóctel (Cocktail),Alejandro Kuropatwa,132939,1996,Chromogenic color print,Photography,2020,missing,missing,Argentine,missing,New
138148,Cóctel (Cocktail),Alejandro Kuropatwa,132939,1996,Chromogenic color print,Photography,2020,missing,missing,Argentine,missing,New
138149,Cóctel (Cocktail),Alejandro Kuropatwa,132939,1996,Chromogenic color print,Photography,2020,missing,missing,Argentine,missing,New


In [17]:
MoMA_complete.Department = MoMA_complete.Department.astype('str')
departments = ', '.join(MoMA_complete.Department)
departments = list(set(departments.split(', ')))
df_moma_complete = pd.DataFrame()

for dep in departments:
    nat_sub =  MoMA_complete[MoMA_complete.Department == dep]    
    nat_sub.DateAcquired = nat_sub.DateAcquired.astype('str')
    years = ', '.join(nat_sub.DateAcquired)
    years = sorted(list([item[:4] for item in list(set(years.split(', ')))]))[0:-1]
    for year in years:
        year_sub = nat_sub[nat_sub.DateAcquired == year]
        entries_year = len(year_sub)
        f_count = len(year_sub[year_sub.Gender == 'F'])
        new_row = pd.DataFrame([[dep, year, entries_year, f_count]])
        df_moma_complete = pd.concat([df_moma_complete, new_row], axis=0, ignore_index=True)

df_moma_complete.columns= ["Department", "DateAcquired", "Count", "Females"]
df_moma_complete 



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,Department,DateAcquired,Count,Females
0,Drawings & Prints,1929,9,0
1,Drawings & Prints,1930,3,0
2,Drawings & Prints,1931,1,0
3,Drawings & Prints,1932,14,0
4,Drawings & Prints,1933,2,0
...,...,...,...,...
429,Architecture & Design,2015,1390,92
430,Architecture & Design,2016,503,127
431,Architecture & Design,2017,101,14
432,Architecture & Design,2018,857,339


In [18]:
   
df_moma_complete.DateAcquired = df_moma_complete.DateAcquired.astype('int')

In [19]:
fig8 = px.area(df_moma_complete, x="DateAcquired", y="Count", color="Department", line_group="Department")
fig8.update_layout(
     xaxis = dict(
        dtick = 10
    )
   
)

fig8.show()

In [20]:
fig9 = px.scatter(df_moma_complete,
                 x="Count", y="DateAcquired", size="Females", color="Department",
                 log_x=True, size_max=30,
                 title="MoMA's departmwnts and female works acquisition")
fig9.show()

Donughts made by laurent for comparisons 

In [21]:
#copy and add before/after value
old = before80.copy()
old['Period'] = 'Contemporary'
new = after80.copy()
new['Period'] = 'Modern'

#merge 
frames = [old, new]
moma_artworks_two_periods = pd.concat(frames)


In [22]:
#count occurrences of department values before and after 1980
before_n = old['Nationality'].value_counts().rename_axis('Nationality').reset_index(name='Contemporary')
after_n = new['Nationality'].value_counts().rename_axis('Nationality').reset_index(name='Modern')
moma_1980_x_dep = pd.merge(before_n, after_n, left_on='Nationality', right_on='Nationality')
moma_1980_x_dep

Unnamed: 0,Nationality,Contemporary,Modern
0,American,39754,17209
1,French,21683,906
2,German,6666,2573
3,missing,5740,568
4,British,4035,1541
...,...,...,...
191,"German, British",1,4
192,"Dutch, Dutch, Dutch",1,6
193,"French, Italian",1,5
194,"Swiss, German",1,1


In [23]:
better = moma_1980_x_dep[moma_1980_x_dep.Contemporary > 100]

In [24]:
better

Unnamed: 0,Nationality,Contemporary,Modern
0,American,39754,17209
1,French,21683,906
2,German,6666,2573
3,missing,5740,568
4,British,4035,1541
5,Spanish,2566,558
6,Italian,2130,644
7,Russian,2081,109
8,Japanese,1695,728
9,Swiss,1367,738


In [25]:
labels = moma_1980_x_dep.Nationality
# Create subplots: use 'domain' type for Pie subplot
fig10 = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])
fig10.add_trace(go.Pie(labels=labels, values=moma_1980_x_dep['Contemporary']),
              1, 1)
fig10.add_trace(go.Pie(labels=labels, values=moma_1980_x_dep['Modern']),
              1, 2)

# Use `hole` to create a donut-like pie chart
fig10.update_traces(hole=.4, hoverinfo="label+percent", textposition='inside')

fig10.update_layout(
    title_text="MoMA Collection by Nationality and Period",
    # Add annotations in the center of the donut pies.
    annotations=[dict(text='Old', x=0.195, y=0.5, font_size=20, showarrow=False),
                 dict(text='New', x=0.820, y=0.5, font_size=20, showarrow=False)], width=1200)
fig10.show()

In [26]:
#copy and add before/after value
rhizNats = rhizome.copy()
rhizNats['Source'] = 'R'
Mnew_nats = after80.copy()
Mnew_nats['Source'] = 'M'

#merge 
frames = [Mnew_nats, rhizNats]
twoSrcs = pd.concat(frames)

In [27]:
rhz_n = rhizNats['Nationality'].value_counts().rename_axis('Nationality').reset_index(name='R')
Mnew_n = Mnew_nats['Nationality'].value_counts().rename_axis('Nationality').reset_index(name='M')
MR_nats_comp = pd.merge(rhz_n, Mnew_n, left_on='Nationality', right_on='Nationality')
MR_nats_comp

Unnamed: 0,Nationality,R,M
0,missing,609,568
1,American,481,17209
2,French,133,906
3,German,88,2573
4,Canadian,86,527
...,...,...,...
62,"Russian, Russian",1,24
63,"British, British, American",1,1
64,Bulgarian,1,8
65,Egyptian,1,58


## 4/ What is the nationality representation in each datasets? Rhizome, MoMA (full)


Viz: stacked area chart (Chiara) + breakdown by decades 
Story: AMERICA!!! BUT ALSO FRANCE!!!! Basically colonialism. 

does nationality representation change over time as a percentage of the total acquisitions? is there a way to talk about representation based on nationality in the story? 



#### Plots: Area charts 
##### Variables: Acquisition date, Nationality, Artworks for nationality

In [28]:
# create working df for MoMA complete
nationalities = ', '.join(MoMA_complete.Nationality)
nationalities = list(set(nationalities.split(', ')))
df_moma_complete = pd.DataFrame()

for dep in nationalities:
    nat_sub =  MoMA_complete[MoMA_complete.Nationality == dep]    
    nat_sub.DateAcquired = nat_sub.DateAcquired.astype('str')
    years = ', '.join(nat_sub.DateAcquired)
    years = sorted(list([item[:4] for item in list(set(years.split(', ')))]))[0:-1]
    for year in years:
        year_sub = nat_sub[nat_sub.DateAcquired == year]
        entries_year = len(year_sub)
        f_count = len(year_sub[year_sub.Gender == 'F'])
        new_row = pd.DataFrame([[dep, year, entries_year, f_count]])
        df_moma_complete = pd.concat([df_moma_complete, new_row], axis=0, ignore_index=True)

df_moma_complete.columns= ["Nationality", "DateAcquired", "Count", "Females"]

df_moma_complete.DateAcquired = df_moma_complete.DateAcquired.astype('int')
df_moma_complete



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,Nationality,DateAcquired,Count,Females
0,Bahamian,2000,1,1
1,Bahamian,2001,1,1
2,Argentine,1942,27,7
3,Argentine,1943,1,0
4,Argentine,1954,29,0
...,...,...,...,...
2441,Croatian,2018,2,0
2442,Croatian,2019,2,0
2443,Emirati,2019,5,5
2444,Sudanese,1965,1,0


In [29]:
# create working df for Rhizome
nationalities = ', '.join(rhizome.Nationality)
nationalities = list(set(nationalities.split(', ')))
df_rhz_nats = pd.DataFrame()

for dep in nationalities:
    nat_sub =  rhizome[rhizome.Nationality == dep]    
    nat_sub.dateAcquired = nat_sub.dateAcquired.astype('str')
    years = ', '.join(nat_sub.dateAcquired)
    years = sorted(list([item[:4] for item in list(set(years.split(', ')))]))[0:-1]
    for year in years:
        year_sub = nat_sub[nat_sub.dateAcquired == year]
        entries_year = len(year_sub)
        f_count = len(year_sub[year_sub.Gender == 'F'])
        new_row = pd.DataFrame([[dep, year, entries_year, f_count]])
        df_rhz_nats = pd.concat([df_rhz_nats, new_row], axis=0, ignore_index=True)

df_rhz_nats.columns= ["Nationality", "dateAcquired", "Count", "Females"]

df_rhz_nats.dateAcquired = df_rhz_nats.dateAcquired.astype('int')
df_rhz_nats




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,Nationality,dateAcquired,Count,Females
0,Italian,2001,1,0
1,Argentine,2001,2,0
2,Argentine,2002,2,1
3,Argentine,2003,2,0
4,Argentine,2004,6,4
...,...,...,...,...
281,Israeli,2006,1,0
282,Israeli,2007,1,1
283,Israeli,2008,3,0
284,Croatian,2003,1,1


In [30]:
# create Moma's decades
df1 = df_moma_complete.copy()
dfmissing = df1[df1.DateAcquired == 0]
df40 = df1[(df1.DateAcquired >0) &(df1.DateAcquired <1949)]
df50 = df1[(df1.DateAcquired >=1950) &(df1.DateAcquired <1959)]
df60 = df1[(df1.DateAcquired >=1960) &(df1.DateAcquired <1969)]
df70 = df1[(df1.DateAcquired >=1970) &(df1.DateAcquired <1979)]
df80 = df1[(df1.DateAcquired >=1980) &(df1.DateAcquired <1989)]
df90 = df1[(df1.DateAcquired >=1990) &(df1.DateAcquired <1999)]
df2000 = df1[(df1.DateAcquired >=2000) &(df1.DateAcquired <2009)]
df2010 = df1[(df1.DateAcquired >=2010) &(df1.DateAcquired <2019)]
df2020 = df1[(df1.DateAcquired >=2020) &(df1.DateAcquired <2029)]

In [31]:
# create Rhizome's decades
df2 = df_rhz_nats.copy()
df2missing = df2[df2.dateAcquired == 0]
df290 = df2[(df2.dateAcquired >0) &(df2.dateAcquired <2000)]
df22000 = df2[(df2.dateAcquired >=2000) &(df2.dateAcquired <2009)]
df22010 = df2[(df2.dateAcquired >=2010) &(df2.dateAcquired <2019)]
df22020 = df2[(df2.dateAcquired >=2020) &(df2.dateAcquired <2029)]

### Moma complete DF -- acquisition of nationalityes over decades

In [32]:
#plot Acquisition distribution over decades
fig11 = px.area(df40, x="DateAcquired", y="Count", color="Nationality", line_group="Nationality", title="Artworks acquired in 1940's")
fig11.show()
fig12 = px.area(df50, x="DateAcquired", y="Count", color="Nationality", line_group="Nationality", title="Artworks acquired in 1950's")
fig12.show()
fig13 = px.area(df60, x="DateAcquired", y="Count", color="Nationality", line_group="Nationality", title="Artworks acquired in 1960's")
fig13.show()
fig14 = px.area(df70, x="DateAcquired", y="Count", color="Nationality", line_group="Nationality", title="Artworks acquired in 1970's")
fig14.show()

fig15 = px.area(df80, x="DateAcquired", y="Count", color="Nationality", line_group="Nationality", title="Artworks acquired in 1980's")
fig15.show()
fig16 = px.area(df90, x="DateAcquired", y="Count", color="Nationality", line_group="Nationality", title="Artworks acquired in 1990's")
fig16.show()
fig17 = px.area(df2000, x="DateAcquired", y="Count", color="Nationality", line_group="Nationality", title="Artworks acquired in 2000's")
fig17.show()
fig18 = px.area(df2010, x="DateAcquired", y="Count", color="Nationality", line_group="Nationality", title="Artworks acquired in 2010's")
fig18.show()


### Rhizome complete DF -- acquisition of nationalityes over decades

In [33]:
#plot Acquisition distribution over decades

fig20 = px.area(df22000, x="dateAcquired", y="Count", color="Nationality", line_group="Nationality", title="Artworks acquired in 2000's")
fig20.show()
fig21 = px.area(df22010, x="dateAcquired", y="Count", color="Nationality", line_group="Nationality", title="Artworks acquired in 2010's")
fig21.show()



### Moma complete DF -- acquisition of nationalityes over decades

In [97]:
# create MOMA df with country neames instead of nationality
natios_MOMA = set(df_moma_complete.Nationality)
missing = pd.DataFrame({'Aalborgenser': ['Korean', 'Native American', 'Canadian Inuit'], 'Aalborg': ['Korea', 'United States', 'Canada']})
missing
df_natParse = pd.read_csv('https://raw.githubusercontent.com/knowitall/chunkedextractor/master/src/main/resources/edu/knowitall/chunkedextractor/demonyms.csv')
correct_country = pd.concat([missing, df_natParse])
Country_df=df_moma_complete.copy()
for item in natios_MOMA:
    my = correct_country[correct_country['Aalborgenser'] == item]
    
    country = my[:1]['Aalborg'].values
    if len(country)>0:
        country_str = my[:1]['Aalborg'].values[0]

        Country_df.loc[Country_df["Nationality"] == item, "Nationality"] = country_str
Country_df

Unnamed: 0,Nationality,DateAcquired,Count,Females
0,The Bahamas,2000,1,1
1,The Bahamas,2001,1,1
2,Argentina,1942,27,7
3,Argentina,1943,1,0
4,Argentina,1954,29,0
...,...,...,...,...
2441,Croatia,2018,2,0
2442,Croatia,2019,2,0
2443,United Arab Emirates,2019,5,5
2444,Sudan,1965,1,0


In [98]:
countries_count = pd.DataFrame(columns= ['Nation', 'Count'])

new_set = set(Country_df.Nationality)
for item in new_set:
    subCountry = Country_df[Country_df['Nationality'] == item]
    sum_acquisitions = subCountry['Count'].sum()
    countries_count.loc[len(countries_count.index)] = [item, sum_acquisitions]

countries_count

Unnamed: 0,Nation,Count
0,Lithuania,12
1,Ireland,23
2,Romania,70
3,Poland,529
4,The Bahamas,2
...,...,...
89,Czech Republic,764
90,Mali,20
91,Spain,2965
92,Algeria,3


In [99]:
def do_fuzzy_search(country):
    try:
        result = pycountry.countries.search_fuzzy(country)
        return result[0].alpha_3
    except:
        return np.nan

countries_count["country_code"] = countries_count["Nation"].apply(lambda country: do_fuzzy_search(country))

In [105]:
df

Unnamed: 0,fips,unemp
0,01001,5.3
1,01003,5.4
2,01005,8.6
3,01007,6.6
4,01009,5.5
...,...,...
3214,72145,13.9
3215,72147,10.6
3216,72149,20.2
3217,72151,16.9


In [219]:
# create moma map plot
import plotly.graph_objects as go
import pandas as pd

fig = go.Figure(data=go.Choropleth(
    locations = countries_count['country_code'],
    z = countries_count['Count_display'],
    text = countries_count['Nation'],
    colorscale=[
            [0,"#8e5a79"],
            [0.3 ,"#925f7d"],
            [0.325 ,"#966582"],
            [0.350 ,"#9a6a87"],
            [0.375,"#9e708b"],
            [0.4 ,"#a27590"],
            [0.525 ,"#b690a6"],
            [0.550 ,"#ba96ab"],
            [0.575,"#be9baf"],
            [0.6 ,"#c2a0b4"],
            [0.625 ,"#c6a6b8"],
            [0.650 ,"#caabbd"],
            [0.675,"#ceb1c2"],
            [0.7 ,"#d2b6c6"],
            [0.725 ,"#d6bccb"],
            [0.995 ,"#d9c1cf"],
            [1, "#FFFFFF"]],
    autocolorscale=False,
    reversescale=True,
    marker_line_color='darkgray',
    marker_line_width=0.5,
    colorbar_title = 'Total artworks',
))

fig.update_layout(
    title_text='Momas total artworks number spread over nations',
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='orthographic',
        showocean=True, oceancolor="LightBlue"
    ),
    height = 700,
)

fig.show()