# Analyse des prix des carburants

Voici un exemple décrit pas à pas qui montre comment on analyse des données publiques disponibles en ligne. Ici on va travailler sur le prix de différentes énergies à partir des données de la [base Pégase](http://developpement-durable.bsocom.fr/Statistiques/) du ministère du développement durable.

La première chose est de sauver les tables qui nous intéressent à savoir 
* les données mensuelles de l'énergie, tarifs domestiques, produits pétroliers. On sauve les données en cliquant sur l'icône dossier, on utilise le format ascii délimité par des points-virgules (.csv). Le fichier est nomé `pegase_prix_petrole_particulier.csv`. 
* le prix des importation de pétrole qu'on sauve au même format dans `pegase_import_petrole.csv`


### Préparer son environnement

Je suppose que tout est bien installé sur votre machine (cf [ici](https://docs.python.org/fr/3.9/installing/index.html) pour des explications). On commence donc par charger les bibliothèques Python dont j'ai besoin :

In [1]:
import numpy as np               # bibliothèque qui ajoute les tables
import pandas as pd              # bibliothèque qui ajoute les tableaux (tableurs)
import matplotlib.pyplot as plt  # bibliothèque graphique pour tracer des courbes
from chart_studio import plotly
import plotly.graph_objs as go
import plotly
import geopandas


# commandes magiques Jupyter pour voir les graphiques dans cette page :
%matplotlib inline                           
%config InlineBackend.figure_format = 'retina'

### Préparation des données

On charge un fichier `.csv` avec la commande de Pandas `read_csv`. C'est l'équivalent d'ouvrir le fichier avec Excel ou LibreOffice et de répondre aux questions sur le formatage du fichier (séparteur est `;`, l'encodage du fichier est du `latin1`).
On ignore aussi les lignes qui ne nous intéressent pas.

In [2]:
impacts = pd.read_csv("data/final_raw_sample_0_percent.csv", sep=",", encoding="utf-8")
impacts['Country'] = impacts['Country'].replace(['Czech Republic'], 'Czechia')
impacts['Country'] = impacts['Country'].replace(['United States'], 'United States of America')
#countries = pd.read_csv("data/countries_codes_and_coordinates.csv")
print(impacts)

       Year                    Company Name                   Country  \
0      2018              TELEPERFORMANCE SE                    France   
1      2018                          SGS SA               Switzerland   
2      2018              INTERTEK GROUP PLC            United Kingdom   
3      2018              APPLUS SERVICES SA                     Spain   
4      2018               BUREAU VERITAS SA                    France   
...     ...                             ...                       ...   
13172  2010  KINTETSU GROUP HOLDINGS CO LTD                     Japan   
13173  2010  NANKAI ELECTRIC RAILWAY CO LTD                     Japan   
13174  2010            MCKESSON CORPORATION  United States of America   
13175  2010       UNITED NATURAL FOODS,INC.  United States of America   
13176  2010              SPAR GROUP LIMITED              South Africa   

                                     Industry (Exiobase)  \
0      Activities auxiliary to financial intermediati...   
1  

### MAP

In [3]:
cur_df = impacts.copy()


In [4]:
cur_df = impacts.copy().dropna()

columns = cur_df.columns
columns
cur_df = cur_df.drop(columns[6:],1)
cur_df = cur_df.drop(columns[1],1)
cur_df = cur_df.drop(columns[3],1)
cur_df

  cur_df = cur_df.drop(columns[6:],1)
  cur_df = cur_df.drop(columns[1],1)
  cur_df = cur_df.drop(columns[3],1)


Unnamed: 0,Year,Country,Total Environmental Intensity (Revenue),Total Environmental Intensity (Operating Income)
0,2018,France,-1.09%,-10.05%
1,2018,Switzerland,-0.81%,-5.31%
2,2018,United Kingdom,-1.53%,-9.38%
3,2018,Spain,-2.26%,-35.02%
4,2018,France,-0.56%,-4.40%
...,...,...,...,...
13172,2010,Japan,-1.39%,-36.98%
13173,2010,Japan,-1.25%,-10.98%
13174,2010,United States of America,-0.04%,-2.26%
13175,2010,United States of America,-1.01%,-33.05%


In [5]:
#cur_df[cur_df.columns[2]] = 
cur_df[cur_df.columns[2]] = np.float64(cur_df[cur_df.columns[2]].map(lambda x: str(x)[:-1]))
cur_df[cur_df.columns[3]] = np.float64(cur_df[cur_df.columns[3]].map(lambda x: str(x)[:-1]))
means = cur_df.groupby(['Year', 'Country'])[cur_df.columns[2], cur_df.columns[3]].mean()
medians = cur_df.groupby(['Year', 'Country'])[cur_df.columns[2], cur_df.columns[3]].median()
#medians.rename({medians.columns[0] : "Median " + medians.columns[0], medians.columns[1] : "Median " + medians.columns[1]})
medians.columns = ["Median " + medians.columns[0],"Median " + medians.columns[1]]
means.columns = ["Mean " + means.columns[0],"Mean " + means.columns[1]]

print(medians)

                               Median Total Environmental Intensity (Revenue)  \
Year Country                                                                    
2010 Australia                                                         -5.270   
     Austria                                                           -2.010   
     Belgium                                                           -1.320   
     Bermuda                                                           88.550   
     Brazil                                                            -2.905   
...                                                                       ...   
2018 Turkey                                                            -4.175   
     Ukraine                                                          -20.020   
     United Arab Emirates                                              -1.665   
     United Kingdom                                                    -0.830   
     United States of Americ

  means = cur_df.groupby(['Year', 'Country'])[cur_df.columns[2], cur_df.columns[3]].mean()
  medians = cur_df.groupby(['Year', 'Country'])[cur_df.columns[2], cur_df.columns[3]].median()


In [6]:
means_medians_df = pd.merge(means, medians, left_index=True, right_index=True, how='outer')

means_medians_df = means_medians_df.reset_index()
means_medians_df

Unnamed: 0,Year,Country,Mean Total Environmental Intensity (Revenue),Mean Total Environmental Intensity (Operating Income),Median Total Environmental Intensity (Revenue),Median Total Environmental Intensity (Operating Income)
0,2010,Australia,-16.047317,-135.272439,-5.270,-30.360
1,2010,Austria,-12.838000,-107.348000,-2.010,-100.000
2,2010,Belgium,-10.421429,-77.172857,-1.320,-14.550
3,2010,Bermuda,88.550000,601.570000,88.550,601.570
4,2010,Brazil,-6.477000,-50.483000,-2.905,-21.500
...,...,...,...,...,...,...
478,2018,Turkey,-27.700000,-218.699000,-4.175,-64.065
479,2018,Ukraine,-20.020000,-110.590000,-20.020,-110.590
480,2018,United Arab Emirates,-1.665000,-12.345000,-1.665,-12.345
481,2018,United Kingdom,-5.083218,-49.031584,-0.830,-6.300


In [7]:
#means_medians_df = means_medians_df.loc[means_medians_df['Year'] == 2017]
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
world.columns=['pop_est', 'continent', 'Country', 'code', 'gdp_md_est', 'geometry']
world['gdp_per_cap'] = world['gdp_md_est'] / world['pop_est']
world['gdp_per_cap_color'] = 'green'
means_medians_loc_df = pd.merge(means_medians_df, world, on='Country', how='outer')
print(world)

       pop_est      continent                   Country code  gdp_md_est  \
0       920938        Oceania                      Fiji  FJI      8374.0   
1     53950935         Africa                  Tanzania  TZA    150600.0   
2       603253         Africa                 W. Sahara  ESH       906.5   
3     35623680  North America                    Canada  CAN   1674000.0   
4    326625791  North America  United States of America  USA  18560000.0   
..         ...            ...                       ...  ...         ...   
172    7111024         Europe                    Serbia  SRB    101800.0   
173     642550         Europe                Montenegro  MNE     10610.0   
174    1895250         Europe                    Kosovo  -99     18490.0   
175    1218208  North America       Trinidad and Tobago  TTO     43570.0   
176   13026129         Africa                  S. Sudan  SSD     20880.0   

                                              geometry  gdp_per_cap  \
0    MULTIPOLYGO

In [8]:
#means_medians_loc_df = means_medians_loc_df.drop('Year', axis=1)#.set_index('Country')

In [9]:
means_medians_loc_df.head(3)

Unnamed: 0,Year,Country,Mean Total Environmental Intensity (Revenue),Mean Total Environmental Intensity (Operating Income),Median Total Environmental Intensity (Revenue),Median Total Environmental Intensity (Operating Income),pop_est,continent,code,gdp_md_est,geometry,gdp_per_cap,gdp_per_cap_color
0,2010.0,Australia,-16.047317,-135.272439,-5.27,-30.36,23232413.0,Oceania,AUS,1189000.0,"MULTIPOLYGON (((147.68926 -40.80826, 148.28907...",0.051178,green
1,2011.0,Australia,-12.869565,-59.316739,-4.35,-31.69,23232413.0,Oceania,AUS,1189000.0,"MULTIPOLYGON (((147.68926 -40.80826, 148.28907...",0.051178,green
2,2012.0,Australia,-11.970784,-69.80098,-3.61,-32.74,23232413.0,Oceania,AUS,1189000.0,"MULTIPOLYGON (((147.68926 -40.80826, 148.28907...",0.051178,green


In [12]:

import json
with open("countries.json") as f:
    countries = json.load(f)

In [15]:
#print(means_medians_loc_df.head())
#print(world.head())
import plotly.express as px
import plotly.io as pio
pio.renderers.default = 'notebook'
#fig = plt.figure()
#ax1 = fig.add_subplot(111, figsize=(25, 17))
#plt.scatter(means_medians_loc_df[])
#means_medians_loc_df.plot(column='gdp_md_est',
#           figsize=(25, 17))
#means_medians_loc_df.plot(column='Mean Total Environmental Intensity (Revenue)',
#           style='scatter')
tmp = means_medians_loc_df.copy().dropna()
tmp.to_csv("tmp.csv")
#geom = tmp['geometry'].to_json()
fig_choro = px.choropleth_mapbox(tmp,
                    geojson=countries,#world.set_index('Country').geometry,
                    locations=tmp.Country,
                    color="gdp_md_est",#np.log10(abs(tmp[tmp.columns[4]])),
                    #projection="mercator",
                    mapbox_style="carto-positron")
                    #hover_name=tmp[tmp.columns[4]],
                    #color_continuous_scale="reds",
                    #range_color=(min(np.log10(abs(tmp[tmp.columns[4]]))), np.log10(abs(max(tmp[tmp.columns[4]])))))

fig = px.scatter_geo(tmp,
                    geojson=world.set_index('Country').geometry,
                    #lat=world.set_index('Country').geometry.y,
                    #lon=world.set_index('Country').geometry.x,
                    locations=tmp.code,
                    #color='Color',#np.log10(-tmp[tmp.columns[2]]),
                    #text=np.round(tmp[tmp.columns[2]].values,2),
                    size="gdp_md_est",
                    opacity=0.8,
                    projection="mercator",
                    hover_name='Country')
                    #animation_frame=np.int64(tmp['Year']), 
                    #animation_group="Country")
colorbar=dict(len=0.75,
                  title=tmp.columns[2], 
                  x=0.9,
                  tickvals = [-2, -1, 0, 1, 2, 2.69],
                  ticktext = ['100','10','1', '-10', '-100', '-500']
                  )
#fig_choro.add_traces(fig.data[0])
'''fig.update_layout(
    autosize=False,
    margin=dict(l=0, r=0, t=0, b=0),
    width=800,
    height=400,
    coloraxis_colorbar=colorbar#dict(title='Count', tickprefix='1.e')    
)'''
fig_map = go.Figure(
            go.Choroplethmapbox(
                geojson=countries,
                locations=tmp['Country'],
                z=map_df[0][key],
                colorscale="Viridis",
                zmin=zmin,
                zmax=zmax,
                colorbar=dict(title="Disponibilité station en %"),
                marker_opacity=0.5,
                marker_line_width=0,
            )
        )
fig_map.show()

NameError: name 'map_df' is not defined

In [86]:
with open("population/data/countries.geojson") as f:
    countries = json.load(f)

for c in countries['features']:
    print(c['properties'])
    c['id'] = c['properties']['ADMIN']
    print(c['id'])

with open('countries.json', 'w') as f:
    json.dump(countries, f)


{'ADMIN': 'Aruba', 'ISO_A3': 'ABW', 'ISO_A2': 'AW'}
Aruba
{'ADMIN': 'Afghanistan', 'ISO_A3': 'AFG', 'ISO_A2': 'AF'}
Afghanistan
{'ADMIN': 'Angola', 'ISO_A3': 'AGO', 'ISO_A2': 'AO'}
Angola
{'ADMIN': 'Anguilla', 'ISO_A3': 'AIA', 'ISO_A2': 'AI'}
Anguilla
{'ADMIN': 'Albania', 'ISO_A3': 'ALB', 'ISO_A2': 'AL'}
Albania
{'ADMIN': 'Aland', 'ISO_A3': 'ALA', 'ISO_A2': 'AX'}
Aland
{'ADMIN': 'Andorra', 'ISO_A3': 'AND', 'ISO_A2': 'AD'}
Andorra
{'ADMIN': 'United Arab Emirates', 'ISO_A3': 'ARE', 'ISO_A2': 'AE'}
United Arab Emirates
{'ADMIN': 'Argentina', 'ISO_A3': 'ARG', 'ISO_A2': 'AR'}
Argentina
{'ADMIN': 'Armenia', 'ISO_A3': 'ARM', 'ISO_A2': 'AM'}
Armenia
{'ADMIN': 'American Samoa', 'ISO_A3': 'ASM', 'ISO_A2': 'AS'}
American Samoa
{'ADMIN': 'Antarctica', 'ISO_A3': 'ATA', 'ISO_A2': 'AQ'}
Antarctica
{'ADMIN': 'Ashmore and Cartier Islands', 'ISO_A3': '-99', 'ISO_A2': 'AQ'}
Ashmore and Cartier Islands
{'ADMIN': 'French Southern and Antarctic Lands', 'ISO_A3': 'ATF', 'ISO_A2': 'TF'}
French Southern and An

In [87]:
with open("countries.json") as f:
    countries = json.load(f)

for c in countries['features']:
    print(c['id'])

Aruba
Afghanistan
Angola
Anguilla
Albania
Aland
Andorra
United Arab Emirates
Argentina
Armenia
American Samoa
Antarctica
Ashmore and Cartier Islands
French Southern and Antarctic Lands
Antigua and Barbuda
Australia
Austria
Azerbaijan
Burundi
Belgium
Benin
Burkina Faso
Bangladesh
Bulgaria
Bahrain
The Bahamas
Bosnia and Herzegovina
Bajo Nuevo Bank (Petrel Is.)
Saint Barthelemy
Belarus
Belize
Bermuda
Bolivia
Brazil
Barbados
Brunei
Bhutan
Botswana
Central African Republic
Canada
Switzerland
Chile
China
Ivory Coast
Clipperton Island
Cameroon
Cyprus No Mans Area
Democratic Republic of the Congo
Republic of Congo
Cook Islands
Colombia
Comoros
Cape Verde
Costa Rica
Coral Sea Islands
Cuba
Curaçao
Cayman Islands
Northern Cyprus
Cyprus
Czech Republic
Germany
Djibouti
Dominica
Denmark
Dominican Republic
Algeria
Ecuador
Egypt
Eritrea
Dhekelia Sovereign Base Area
Spain
Estonia
Ethiopia
Finland
Fiji
Falkland Islands
France
Faroe Islands
Federated States of Micronesia
Gabon
United Kingdom
Georgia
Guer

### Stats

Maintenant on peut afficher une colonne sous forme de courbe :

In [153]:
cur_df = impacts.copy(True).dropna()
cur_df[cur_df.columns[4]] = np.float64(cur_df[cur_df.columns[4]].map(lambda x: str(x)[:-1]))
cur_df[cur_df.columns[5]] = np.float64(cur_df[cur_df.columns[5]].map(lambda x: str(x)[:-1]))
cur_df['Count'] = 1
cur_df

Unnamed: 0,Year,Company Name,Country,Industry (Exiobase),Total Environmental Intensity (Revenue),Total Environmental Intensity (Operating Income),Total Environmental Cost,Working Capacity,Fish Production Capacity,Crop Production Capacity,Meat Production Capacity,Biodiversity,Abiotic Resources,Water production capacity (Drinking water & Irrigation Water),Wood Production Capacity,% Imputed,Count
0,2018,TELEPERFORMANCE SE,France,Activities auxiliary to financial intermediati...,-1.09,-10.05,"(5,52,32,974)","(4,85,90,497)","(11,456)","(6,46,758)","(1,51,520)","(2,061)","(3,661)","(58,28,063)",1042,2%,1
1,2018,SGS SA,Switzerland,Activities auxiliary to financial intermediati...,-0.81,-5.31,"(5,51,43,250)","(5,07,27,341)","(11,763)","(6,73,791)","(1,57,783)","(2,131)","(3,489)","(35,67,524)",571,0%,1
2,2018,INTERTEK GROUP PLC,United Kingdom,Activities auxiliary to financial intermediati...,-1.53,-9.38,"(5,46,77,862)","(5,34,89,006)","(12,428)","(7,11,263)","(1,66,744)","(2,244)","(3,428)","(2,93,207)",458,1%,1
3,2018,APPLUS SERVICES SA,Spain,Activities auxiliary to financial intermediati...,-2.26,-35.02,"(4,32,98,590)","(1,89,12,678)","(4,652)","(2,49,594)","(58,535)",(811),"(3,850)","(2,40,69,048)",578,2%,1
4,2018,BUREAU VERITAS SA,France,Activities auxiliary to financial intermediati...,-0.56,-4.40,"(3,08,62,191)","(3,01,89,038)","(7,276)","(4,02,067)","(94,143)","(1,298)","(3,953)","(1,65,542)",1126,3%,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13172,2010,KINTETSU GROUP HOLDINGS CO LTD,Japan,Transport via railways,-1.39,-36.98,"(16,46,03,915)","(16,12,35,941)","(44,060)","(20,15,875)","(4,70,665)","(7,350)","(20,845)","(8,27,325)",18146,6%,1
13173,2010,NANKAI ELECTRIC RAILWAY CO LTD,Japan,Transport via railways,-1.25,-10.98,"(2,86,93,183)","(2,70,82,423)","(7,584)","(3,35,328)","(78,239)","(1,251)","(4,032)","(11,87,837)",3510,7%,1
13174,2010,MCKESSON CORPORATION,United States of America,"Wholesale trade and commission trade,except of...",-0.04,-2.26,"(4,51,30,939)","(4,19,35,531)","(13,246)","(4,96,341)","(1,16,908)","(1,846)","(83,836)","(24,84,942)",1711,15%,1
13175,2010,"UNITED NATURAL FOODS,INC.",United States of America,"Wholesale trade and commission trade,except of...",-1.01,-33.05,"(3,79,73,013)","(3,71,46,871)","(9,000)","(4,94,356)","(1,15,847)","(1,603)","(2,898)","(2,03,943)",1505,4%,1


In [193]:

df = cur_df.copy(True)
cur_col = df.columns[4]
quantiles = [df[cur_col].quantile((i+1)/100) for i in range(100)]
print(quantiles[:10])
lq = np.float64(-20000000000000)
for q in quantiles:
    df.loc[df[cur_col].between(lq, q, 'neither'),cur_col] = q
    lq = q

df['Amount of values'] = 1
tmp = df.groupby(cur_col)['Amount of values'].sum().reset_index()
tmp.columns = ['Total Environmental Intensity (Revenue)','Amount of values']
print(tmp)
print()
print(tmp.values)
fig = px.histogram(tmp, x=['Total Environmental Intensity (Revenue)', 'Total Environmental Intensity (Operating Income)'], y='Amount of values', log_y=True, nbins=100)
#fig.add_trace(go.Bar(y = tmp, x = tmp.index))
fig.show()
#tmp.plot(x='Total Environmental Intensity (Revenue)', y='Amount of values', kind='hist')

[-343.5231999999995, -177.52879999999965, -118.80560000000123, -85.84639999999973, -81.23, -59.71600000000077, -47.75039999999933, -39.90959999999987, -33.545600000000235, -32.53]
    Total Environmental Intensity (Revenue)  Amount of values
0                                 -343.5232               125
1                                 -177.5288               124
2                                 -118.8056               124
3                                  -85.8464               124
4                                  -81.2300               125
..                                      ...               ...
95                                  -0.0800               103
96                                  -0.0500               144
97                                  -0.0200                86
98                                2855.5040               116
99                                3569.1200               125

[100 rows x 2 columns]

[[-3.435232e+02  1.250000e+02]
 [-1.775288e+02  1.2

In [205]:
def create_dist(in_df,year=None, country=None):
    df = in_df.copy(True)
    if (year):
        df = df.loc[df.Year == year]
    if (country):
        df = df.loc[df.Country == country]

    dfs = []
    
    for i in range(2):
        cur_col = df.columns[4 + i]
        quantiles = [df[cur_col].quantile((i+1)/100) for i in range(100)]
        print(quantiles[:10])
        lq = np.float64(-20000000000000)
        for q in quantiles:
            df.loc[df[cur_col].between(lq, q, 'neither'),cur_col] = q
            lq = q

        df['Amount of values'] = 1
        tmp = df.groupby(cur_col)['Amount of values'].sum().reset_index()
        tmp.columns = [cur_col,'Amount of values']
        dfs.append(tmp)
    
    fig = plotly.subplots.make_subplots(specs=[[{"secondary_y": True}]])
    fig.add_trace(go.Histogram(x=dfs[0]['Total Environmental Intensity (Revenue)'], y=dfs[0]['Amount of values'], nbinsx=100,
                    secondary_y=False,
                    name="Amount of entries (Revenue)"))
    fig.add_trace(go.Histogram(x=dfs[1]['Total Environmental Intensity (Operating Income)'], y=dfs[1]['Amount of values'], 
                    nbinsx=100,
                    secondary_y=True,
                    name="Amount of entries (Operating Income)"))
    fig.show()
create_dist(cur_df)

[-343.5231999999995, -177.52879999999965, -118.80560000000123, -85.84639999999973, -81.23, -59.71600000000077, -47.75039999999933, -39.90959999999987, -33.545600000000235, -32.53]
[-4430.242, -2313.17, -1522.948, -1118.3, -918.74, -765.926, -642.6699999999998, -544.376, -480.702, -415.63]


ValueError: Invalid property specified for object of type plotly.graph_objs.Histogram: 'secondary'

Did you mean "legendrank"?

    Valid properties:
        alignmentgroup
            Set several traces linked to the same position axis or
            matching axes to the same alignmentgroup. This controls
            whether bars compute their positional range dependently
            or independently.
        autobinx
            Obsolete: since v1.42 each bin attribute is auto-
            determined separately and `autobinx` is not needed.
            However, we accept `autobinx: true` or `false` and will
            update `xbins` accordingly before deleting `autobinx`
            from the trace.
        autobiny
            Obsolete: since v1.42 each bin attribute is auto-
            determined separately and `autobiny` is not needed.
            However, we accept `autobiny: true` or `false` and will
            update `ybins` accordingly before deleting `autobiny`
            from the trace.
        bingroup
            Set a group of histogram traces which will have
            compatible bin settings. Note that traces on the same
            subplot and with the same "orientation" under `barmode`
            "stack", "relative" and "group" are forced into the
            same bingroup, Using `bingroup`, traces under `barmode`
            "overlay" and on different axes (of the same axis type)
            can have compatible bin settings. Note that histogram
            and histogram2d* trace can share the same `bingroup`
        cliponaxis
            Determines whether the text nodes are clipped about the
            subplot axes. To show the text nodes above axis lines
            and tick labels, make sure to set `xaxis.layer` and
            `yaxis.layer` to *below traces*.
        constraintext
            Constrain the size of text inside or outside a bar to
            be no larger than the bar itself.
        cumulative
            :class:`plotly.graph_objects.histogram.Cumulative`
            instance or dict with compatible properties
        customdata
            Assigns extra data each datum. This may be useful when
            listening to hover, click and selection events. Note
            that, "scatter" traces also appends customdata items in
            the markers DOM elements
        customdatasrc
            Sets the source reference on Chart Studio Cloud for
            `customdata`.
        error_x
            :class:`plotly.graph_objects.histogram.ErrorX` instance
            or dict with compatible properties
        error_y
            :class:`plotly.graph_objects.histogram.ErrorY` instance
            or dict with compatible properties
        histfunc
            Specifies the binning function used for this histogram
            trace. If "count", the histogram values are computed by
            counting the number of values lying inside each bin. If
            "sum", "avg", "min", "max", the histogram values are
            computed using the sum, the average, the minimum or the
            maximum of the values lying inside each bin
            respectively.
        histnorm
            Specifies the type of normalization used for this
            histogram trace. If "", the span of each bar
            corresponds to the number of occurrences (i.e. the
            number of data points lying inside the bins). If
            "percent" / "probability", the span of each bar
            corresponds to the percentage / fraction of occurrences
            with respect to the total number of sample points
            (here, the sum of all bin HEIGHTS equals 100% / 1). If
            "density", the span of each bar corresponds to the
            number of occurrences in a bin divided by the size of
            the bin interval (here, the sum of all bin AREAS equals
            the total number of sample points). If *probability
            density*, the area of each bar corresponds to the
            probability that an event will fall into the
            corresponding bin (here, the sum of all bin AREAS
            equals 1).
        hoverinfo
            Determines which trace information appear on hover. If
            `none` or `skip` are set, no information is displayed
            upon hovering. But, if `none` is set, click and hover
            events are still fired.
        hoverinfosrc
            Sets the source reference on Chart Studio Cloud for
            `hoverinfo`.
        hoverlabel
            :class:`plotly.graph_objects.histogram.Hoverlabel`
            instance or dict with compatible properties
        hovertemplate
            Template string used for rendering the information that
            appear on hover box. Note that this will override
            `hoverinfo`. Variables are inserted using %{variable},
            for example "y: %{y}" as well as %{xother}, {%_xother},
            {%_xother_}, {%xother_}. When showing info for several
            points, "xother" will be added to those with different
            x positions from the first point. An underscore before
            or after "(x|y)other" will add a space on that side,
            only when this field is shown. Numbers are formatted
            using d3-format's syntax %{variable:d3-format}, for
            example "Price: %{y:$.2f}".
            https://github.com/d3/d3-format/tree/v1.4.5#d3-format
            for details on the formatting syntax. Dates are
            formatted using d3-time-format's syntax
            %{variable|d3-time-format}, for example "Day:
            %{2019-01-01|%A}". https://github.com/d3/d3-time-
            format/tree/v2.2.3#locale_format for details on the
            date formatting syntax. The variables available in
            `hovertemplate` are the ones emitted as event data
            described at this link
            https://plotly.com/javascript/plotlyjs-events/#event-
            data. Additionally, every attributes that can be
            specified per-point (the ones that are `arrayOk: true`)
            are available. variable `binNumber` Anything contained
            in tag `<extra>` is displayed in the secondary box, for
            example "<extra>{fullData.name}</extra>". To hide the
            secondary box completely, use an empty tag
            `<extra></extra>`.
        hovertemplatesrc
            Sets the source reference on Chart Studio Cloud for
            `hovertemplate`.
        hovertext
            Same as `text`.
        hovertextsrc
            Sets the source reference on Chart Studio Cloud for
            `hovertext`.
        ids
            Assigns id labels to each datum. These ids for object
            constancy of data points during animation. Should be an
            array of strings, not numbers or any other type.
        idssrc
            Sets the source reference on Chart Studio Cloud for
            `ids`.
        insidetextanchor
            Determines if texts are kept at center or start/end
            points in `textposition` "inside" mode.
        insidetextfont
            Sets the font used for `text` lying inside the bar.
        legendgroup
            Sets the legend group for this trace. Traces part of
            the same legend group hide/show at the same time when
            toggling legend items.
        legendgrouptitle
            :class:`plotly.graph_objects.histogram.Legendgrouptitle
            ` instance or dict with compatible properties
        legendrank
            Sets the legend rank for this trace. Items and groups
            with smaller ranks are presented on top/left side while
            with `*reversed* `legend.traceorder` they are on
            bottom/right side. The default legendrank is 1000, so
            that you can use ranks less than 1000 to place certain
            items before all unranked items, and ranks greater than
            1000 to go after all unranked items.
        marker
            :class:`plotly.graph_objects.histogram.Marker` instance
            or dict with compatible properties
        meta
            Assigns extra meta information associated with this
            trace that can be used in various text attributes.
            Attributes such as trace `name`, graph, axis and
            colorbar `title.text`, annotation `text`
            `rangeselector`, `updatemenues` and `sliders` `label`
            text all support `meta`. To access the trace `meta`
            values in an attribute in the same trace, simply use
            `%{meta[i]}` where `i` is the index or key of the
            `meta` item in question. To access trace `meta` in
            layout attributes, use `%{data[n[.meta[i]}` where `i`
            is the index or key of the `meta` and `n` is the trace
            index.
        metasrc
            Sets the source reference on Chart Studio Cloud for
            `meta`.
        name
            Sets the trace name. The trace name appear as the
            legend item and on hover.
        nbinsx
            Specifies the maximum number of desired bins. This
            value will be used in an algorithm that will decide the
            optimal bin size such that the histogram best
            visualizes the distribution of the data. Ignored if
            `xbins.size` is provided.
        nbinsy
            Specifies the maximum number of desired bins. This
            value will be used in an algorithm that will decide the
            optimal bin size such that the histogram best
            visualizes the distribution of the data. Ignored if
            `ybins.size` is provided.
        offsetgroup
            Set several traces linked to the same position axis or
            matching axes to the same offsetgroup where bars of the
            same position coordinate will line up.
        opacity
            Sets the opacity of the trace.
        orientation
            Sets the orientation of the bars. With "v" ("h"), the
            value of the each bar spans along the vertical
            (horizontal).
        outsidetextfont
            Sets the font used for `text` lying outside the bar.
        selected
            :class:`plotly.graph_objects.histogram.Selected`
            instance or dict with compatible properties
        selectedpoints
            Array containing integer indices of selected points.
            Has an effect only for traces that support selections.
            Note that an empty array means an empty selection where
            the `unselected` are turned on for all points, whereas,
            any other non-array values means no selection all where
            the `selected` and `unselected` styles have no effect.
        showlegend
            Determines whether or not an item corresponding to this
            trace is shown in the legend.
        stream
            :class:`plotly.graph_objects.histogram.Stream` instance
            or dict with compatible properties
        text
            Sets hover text elements associated with each bar. If a
            single string, the same string appears over all bars.
            If an array of string, the items are mapped in order to
            the this trace's coordinates.
        textangle
            Sets the angle of the tick labels with respect to the
            bar. For example, a `tickangle` of -90 draws the tick
            labels vertically. With "auto" the texts may
            automatically be rotated to fit with the maximum size
            in bars.
        textfont
            Sets the text font.
        textposition
            Specifies the location of the `text`. "inside"
            positions `text` inside, next to the bar end (rotated
            and scaled if needed). "outside" positions `text`
            outside, next to the bar end (scaled if needed), unless
            there is another bar stacked on this one, then the text
            gets pushed inside. "auto" tries to position `text`
            inside the bar, but if the bar is too small and no bar
            is stacked on this one the text is moved outside. If
            "none", no text appears.
        textsrc
            Sets the source reference on Chart Studio Cloud for
            `text`.
        texttemplate
            Template string used for rendering the information text
            that appear on points. Note that this will override
            `textinfo`. Variables are inserted using %{variable},
            for example "y: %{y}". Numbers are formatted using
            d3-format's syntax %{variable:d3-format}, for example
            "Price: %{y:$.2f}".
            https://github.com/d3/d3-format/tree/v1.4.5#d3-format
            for details on the formatting syntax. Dates are
            formatted using d3-time-format's syntax
            %{variable|d3-time-format}, for example "Day:
            %{2019-01-01|%A}". https://github.com/d3/d3-time-
            format/tree/v2.2.3#locale_format for details on the
            date formatting syntax. Every attributes that can be
            specified per-point (the ones that are `arrayOk: true`)
            are available. variables `label` and `value`.
        uid
            Assign an id to this trace, Use this to provide object
            constancy between traces during animations and
            transitions.
        uirevision
            Controls persistence of some user-driven changes to the
            trace: `constraintrange` in `parcoords` traces, as well
            as some `editable: true` modifications such as `name`
            and `colorbar.title`. Defaults to `layout.uirevision`.
            Note that other user-driven trace attribute changes are
            controlled by `layout` attributes: `trace.visible` is
            controlled by `layout.legend.uirevision`,
            `selectedpoints` is controlled by
            `layout.selectionrevision`, and `colorbar.(x|y)`
            (accessible with `config: {editable: true}`) is
            controlled by `layout.editrevision`. Trace changes are
            tracked by `uid`, which only falls back on trace index
            if no `uid` is provided. So if your app can add/remove
            traces before the end of the `data` array, such that
            the same trace has a different index, you can still
            preserve user-driven changes if you give each trace a
            `uid` that stays with it as it moves.
        unselected
            :class:`plotly.graph_objects.histogram.Unselected`
            instance or dict with compatible properties
        visible
            Determines whether or not this trace is visible. If
            "legendonly", the trace is not drawn, but can appear as
            a legend item (provided that the legend itself is
            visible).
        x
            Sets the sample data to be binned on the x axis.
        xaxis
            Sets a reference between this trace's x coordinates and
            a 2D cartesian x axis. If "x" (the default value), the
            x coordinates refer to `layout.xaxis`. If "x2", the x
            coordinates refer to `layout.xaxis2`, and so on.
        xbins
            :class:`plotly.graph_objects.histogram.XBins` instance
            or dict with compatible properties
        xcalendar
            Sets the calendar system to use with `x` date data.
        xhoverformat
            Sets the hover text formatting rulefor `x`  using d3
            formatting mini-languages which are very similar to
            those in Python. For numbers, see:
            https://github.com/d3/d3-format/tree/v1.4.5#d3-format.
            And for dates see: https://github.com/d3/d3-time-
            format/tree/v2.2.3#locale_format. We add two items to
            d3's date formatter: "%h" for half of the year as a
            decimal number as well as "%{n}f" for fractional
            seconds with n digits. For example, *2016-10-13
            09:15:23.456* with tickformat "%H~%M~%S.%2f" would
            display *09~15~23.46*By default the values are
            formatted using `xaxis.hoverformat`.
        xsrc
            Sets the source reference on Chart Studio Cloud for
            `x`.
        y
            Sets the sample data to be binned on the y axis.
        yaxis
            Sets a reference between this trace's y coordinates and
            a 2D cartesian y axis. If "y" (the default value), the
            y coordinates refer to `layout.yaxis`. If "y2", the y
            coordinates refer to `layout.yaxis2`, and so on.
        ybins
            :class:`plotly.graph_objects.histogram.YBins` instance
            or dict with compatible properties
        ycalendar
            Sets the calendar system to use with `y` date data.
        yhoverformat
            Sets the hover text formatting rulefor `y`  using d3
            formatting mini-languages which are very similar to
            those in Python. For numbers, see:
            https://github.com/d3/d3-format/tree/v1.4.5#d3-format.
            And for dates see: https://github.com/d3/d3-time-
            format/tree/v2.2.3#locale_format. We add two items to
            d3's date formatter: "%h" for half of the year as a
            decimal number as well as "%{n}f" for fractional
            seconds with n digits. For example, *2016-10-13
            09:15:23.456* with tickformat "%H~%M~%S.%2f" would
            display *09~15~23.46*By default the values are
            formatted using `yaxis.hoverformat`.
        ysrc
            Sets the source reference on Chart Studio Cloud for
            `y`.
        
Did you mean "legendrank"?

Bad property path:
secondary_y
^^^^^^^^^

In [216]:
fig = plotly.subplots.make_subplots()
fig.add_trace(go.Histogram(x=cur_df['Total Environmental Intensity (Revenue)'], nbinsx=100,
                #secondary_y=True,
                name="Amount of entries (Revenue)"))
fig.add_trace(go.Histogram(x=cur_df['Total Environmental Intensity (Operating Income)'], 
                    nbinsx=100,
                    #secondary_y=True,
                    name="Amount of entries (Operating Income)"))
fig.update_yaxes(type="log")
fig.show()