# UC Publishing Activity in CUP Journals
## Interactive Graphs

# Getting Started:

## Click the box of code below and run it. You can either press the "Run" button at the top of the browser while the below cell is selected or select the cell and press Shift and Enter

In [30]:
#These lines load in the packages used for data manipulation and plotting

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import pickle5 as pickle

#This line loads in the data

with open("publication_df.pickle", "rb") as fh:
    publication_df = pickle.load(fh)

new_df2 = publication_df[(publication_df['year'] >= 1990)]

## Run the below box to see the number of publications per year going back to 1990

Hover over the plot to see the number of publications that year. You can select ranges by dragging over an area. Double click to go back to default view. 

In [31]:
new_df = new_df2['year'].value_counts().rename_axis('year').reset_index(name='counts')

fig_test = px.bar(new_df, y="year", x="counts", color="counts", 
                  title="UC Publications in CUP Journals since 1990",
                 orientation = 'h',width=600, height=1200, )
fig_test.show()

### Note: running the below line you can explore the publication info for any given year, just change the year value

For example, if you want to see the publications from 2016, change 1995 to 2016 in the line below

In [37]:
publication_df[publication_df['year']==1965]

Unnamed: 0,authors,title,concepts,pub_id,category_for,times_cited,year,open_access_categories_v2,journal.id,journal.title,...,aff_city_id,aff_country,aff_country_code,aff_state,aff_state_code,researcher_id,first_name,last_name,oa_status,oa_status_2
398,[{'raw_affiliation': ['University of Cincinnat...,Reminiscences; General of the Army Douglas Mac...,"[VIII, index, General, maps, McGraw, MacArthur...",pub.1069714714,,0,1965,[closed],jour.1123672,The Journal of Asian Studies,...,4508722.0,United States,US,Ohio,US-OH,ur.013526600141.25,harold m.,vinacke,closed,closed
399,[{'raw_affiliation': ['University of Cincinnat...,The Rise and Fall of Western Colonialism: A Hi...,"[early nineteenth century, Western colonialism...",pub.1070712499,"[{'id': '3577', 'name': '2002 Cultural Studies...",0,1965,[closed],jour.1041487,The Americas A Quarterly Review of Latin Ameri...,...,4508722.0,United States,US,Ohio,US-OH,ur.012721272371.07,herbert f.,curry,closed,closed
400,[{'raw_affiliation': ['University of Cincinnat...,On the compatibility of the recent solar paral...,,pub.1027632258,,2,1965,"[oa_all, gold]",jour.1159373,Symposium - International Astronomical Union,...,4508722.0,United States,US,Ohio,US-OH,ur.010453766071.56,eugene,rabe,gold,open


## Run the below box to see the number of publications per journal going back to 1990

Hover over the plot to see the number of publications in that journal during this time range. You can select ranges by dragging over an area. Double click to go back to default view. 

In [32]:
journals_df = new_df2['journal.title'].value_counts().rename_axis('journal.title').reset_index(name='counts')

journals_df = journals_df[journals_df["counts"] > 1].sort_values(by="counts")

fig_test = px.bar(journals_df, y="journal.title", x="counts", color="counts", 
                  title="Most Published in CUP Journals since 1990", 

                  orientation = 'h', width=600, height=1200)


fig_test.show()

## Run the below box to see the number of publications per year colored by journal title. 
Hover over the colored stacks to see the journal title and number of publications per year per journal. You can select ranges by dragging over an area. Double click to go back to default view. 

In [33]:
publications_per_year_and_journal = new_df2.groupby(["year","journal.title"],as_index=False).size()
fig_test2 = px.bar(publications_per_year_and_journal, x="year", y="size", color="journal.title", title="UC Publications in CUP Journals since 1990")

fig_test2.update_layout(showlegend=False)
fig_test2.show()

## Run the below box to see the number of publications per year colored by OA type. Includes closed publications
Hover over the colored stacks to see the number of type of publication (OA type or closed) per year. You can select ranges by dragging over an area. Double click to go back to default view. 

In [35]:
oa_stack_df = new_df2.groupby(["year","oa_status"],as_index=False).size()

fig = px.bar(oa_stack_df, x="year", y="size", color="oa_status", title="UC Publications in CUP: Publication Type per Year"
             #,hover_data=[mode(new_df3.journal.title)]
            )
fig.show()

## Run the below box to see the number of publications per year colored by OA type. Only OA publications

Hover over the colored stacks to see the number of type of publication (only OA) per year. You can select ranges by dragging over an area. Double click to go back to default view. 

In [36]:
oa_df = publication_df[((publication_df["oa_status"] != "closed"))]
non_oa_df = publication_df[((publication_df["oa_status"] == "closed"))]

oa_stack_df = oa_df.groupby(["year","oa_status"],as_index=False).size()

fig = px.bar(oa_stack_df, x="year", y="size", color="oa_status", title="UC Open Access Publications in CUP"
             #,hover_data=[mode(new_df3.journal.title)]
            )
fig.show()

### Note: running the below line you can explore the publication info for any given year, just change the year value

For example, if you want to see the publications from 2016, change 1965 to 2016 in the line below

In [40]:
oa_df[oa_df['year']==1965]

Unnamed: 0,authors,title,concepts,pub_id,category_for,times_cited,year,open_access_categories_v2,journal.id,journal.title,...,aff_city_id,aff_country,aff_country_code,aff_state,aff_state_code,researcher_id,first_name,last_name,oa_status,oa_status_2
400,[{'raw_affiliation': ['University of Cincinnat...,On the compatibility of the recent solar paral...,,pub.1027632258,,2,1965,"[oa_all, gold]",jour.1159373,Symposium - International Astronomical Union,...,4508722.0,United States,US,Ohio,US-OH,ur.010453766071.56,eugene,rabe,gold,open


## Run the below box to see the OA publications by Journal Title, colored by OA Type. Only OA publications

Hover over the colored stacks to see the number of type of publication (only OA) per journal. Hovering also shows the journal title. You can select ranges by dragging over an area. Double click to go back to default view. 

In [44]:
oa_stacked = oa_df.groupby(["oa_status","journal.title"],as_index=False).size()

fig = px.bar(oa_stacked, x="size", y="journal.title", color="oa_status", 
             orientation="h", title="UC OA Publications in CUP Journals")
fig.update_layout(showlegend=False, yaxis={'categoryorder':'total ascending'})

fig.show()

## Run the below box to see the number of different kinds of OA publications per journal between 1990 and 2021

Hover over the squares in the plot below to see the journal title, the type of publications (gold OA, bronze OA, closed...), and the number of publications in that journal of that type. You can select ranges by dragging over an area. Double click to go back to default view. 

In [16]:
df_OA = publication_df.groupby(["oa_status","journal.title"],as_index=False).size()

df_OA2 = df_OA.pivot(index="journal.title", columns = "oa_status")

df_OA2.columns = biggertest.columns.to_flat_index()

df_OA2.columns = ["bronze","closed", "gold", "green", "hybrid"]

df_OA3 = df_OA2.drop(df_OA2[df_OA2.sum(axis=1) < 5].index)

df_OA3 = interesting_test.T
#interesting_test2 = interesting_test2.fillna(0)
df_OA4 = df_OA3.drop(index="closed")

fig = px.imshow(df_OA3, width=1400, height=600, color_continuous_scale="turbo", title = "Number and Type of OA Publication by Journal Title")
fig.show()

## This is the same as above but with the "closed" category removed

In [17]:
fig = px.imshow(df_OA4, width=1400, height=600, color_continuous_scale="turbo", title = "Number and Type of OA Publication by Journal Title")
fig.show()

# Exploratory Plots

## Run the below box to see box plots comparing "times cited" between OA and Non-OA publications

Hover over the box plots to see the quartile and fence values. You can select ranges by dragging over an area. Double click to go back to default view. 

In [41]:
fig = px.box(publication_df, x="oa_status_2",y="times_cited")
fig.update_traces(quartilemethod="inclusive") # or "inclusive", or "linear" by default
fig.show()

## Run the below box to see box plots comparing "altmetrics" between OA and Non-OA publications

Hover over the box plots to see the quartile and fence values. You can select ranges by dragging over an area. Double click to go back to default view. 

In [42]:
fig = px.box(publication_df, x="oa_status_2",y="altmetric")
fig.update_traces(quartilemethod="inclusive") # or "inclusive", or "linear" by default
fig.show()

In [45]:
years = [2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021]

#rslt_df = dataframe[(dataframe['Age'] == 21) &
#          dataframe['Stream'].isin(options)]

df1 = publication_df[(publication_df['altmetric'] >= 1) &
      publication_df['year'].isin(years)]

figscatter = px.scatter(df1, x="year", y="altmetric", 
                        size = "times_cited",
                        color='journal.title',
                        hover_data=['title',"first_name","last_name", "oa_status"])



figscatter.show()

In [46]:
years = [2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021]

df_scat = publication_df[(publication_df['times_cited'] >= 1) &
             (publication_df['altmetric'] >= 1) &
      publication_df['year'].isin(years)]

figscatter2 = px.scatter(df_scat, x="year", y="times_cited", 
                        size = "altmetric",
                        color='journal.title',
                        hover_data=['title',"oa_status"])



figscatter2.show()