# Step 1 
This step suggest how to fetch the reference list with the reference number, doi, title year of review online by using the crossref API. 

You need to enter the doi of the paper on #Example DOI and enter the path to save the .csv file

This step only creates the reference list with details. In the next step you will have to input the data from the table inside papers manually or using a tabula.org


In [1]:
import requests
import pandas as pd

# Function to fetch references using the CrossRef API
def get_references_from_doi(doi):
    url = f"https://api.crossref.org/works/{doi}"
    response = requests.get(url)
    
    if response.status_code == 200:
        data = response.json()
        references = data.get('message', {}).get('reference', [])
        return references
    else:
        print(f"Error fetching data for DOI {doi}: {response.status_code}")
        return None

# Helper function to extract necessary information for the table
def extract_reference_info(ref, order):
    # Extract the title
    title = ref.get('article-title', 'No Title')
    
    # Reference order number
    reference_number = order + 1  # Start from 1
    
    # Extract DOI if available
    ref_doi = ref.get('DOI', 'No DOI')
    
    # Extract the year of publication
    year = ref.get('year', 'No Year')
    
    # Return a dictionary with only the desired columns
    return {
        "Title": title,
        "Reference Number": reference_number,
        "DOI": ref_doi,
        "Year": year
    }

# Example DOI
doi = "10.1016/j.jpowsour.2021.230687"  # Replace with your DOI
references = get_references_from_doi(doi)

# Create a DataFrame from the extracted references information
if references:
    reference_data = [extract_reference_info(ref, i) for i, ref in enumerate(references)]
    df = pd.DataFrame(reference_data, columns=["Title", "Reference Number", "DOI", "Year"])
    
    # Display the table
    print(df)
    
    # Optionally, save to CSV
    output_csv_path = 'C:\\Users\\pedro\\Desktop\\Materials World\\Reference list He 2021.csv'
    df.to_csv(output_csv_path, index=False)
    print(f"References table saved to {output_csv_path}")
else:
    print("No references found.")

                                                Title  Reference Number  \
0   A comprehensive review on emerging constructed...                 1   
1   Electroactive microorganisms in bioelectrochem...                 2   
2    Microbial fuel cells: methodology and technology                 3   
3   Advances in microbial fuel cells for wastewate...                 4   
4   Conversion of wastes into bioelectricity and c...                 5   
5   Outlook on the role of microbial fuel cells in...                 6   
6   Development and modification of materials to b...                 7   
7   Application of advanced anodes in microbial fu...                 8   
8   Biosurfactants and synthetic surfactants in bi...                 9   
9   Mini-review: anode modification for improved p...                10   
10  A comprehensive review on microbial fuel cell ...                11   
11                                           No Title                12   
12  Polypyrrole modified 

# Step 2 - Merge two data frames based on common values in Pandas
Now you will merge the 2 two data frames *a)reference list* and *b)table with info* based on the common values of the **reference number** present in both data frames.

This step must be repeated for every new paper studied.

Below there is an example of merging the reference list and the table with info from a Review paper from Yaqoob et. al. 2020 (doi:10.3390/ma13092078)

In [4]:
reference_list_he2021 = pd.read_csv("Reference list He 2021.csv")
reference_list_he2021.head()


Unnamed: 0,Title,Reference Number,DOI,Year
0,A comprehensive review on emerging constructed...,1,No DOI,2020
1,Electroactive microorganisms in bioelectrochem...,2,10.1038/s41579-019-0173-x,2019
2,Microbial fuel cells: methodology and technology,3,10.1021/es0605016,2006
3,Advances in microbial fuel cells for wastewate...,4,10.1016/j.rser.2016.12.069,2017
4,Conversion of wastes into bioelectricity and c...,5,10.1126/science.1217412,2012


In [5]:
table_info_he2021 = pd.read_csv("Table info_He 2020.csv")
table_info_he2021.head()

Unnamed: 0,Type of Material,Anode,Size of Anode,Surface Area of Anode cm2,Inoculum Source/,Power Density mw/m2,Reference Number
0,Carbon-based,PANI-Sodium Alginate-Carbon Brush,,,Mixed Culture,520,18
1,Carbon-based,FeS2-decorated Graphene,,,Mixed Culture,3222,31
2,Carbon-based,3D N-doped carbon foam,,,Mixed Culture,4999,34
3,Carbon-based,3D printed carbonaceous porous,,,S.Oneidensis,230,15


In [7]:
he2021_merged = pd.merge(table_info_he2021, reference_list_he2021, on="Reference Number")
he2021_merged.to_csv('Merged He 2021.csv')

In [26]:
he2021_merged.head(5)

Unnamed: 0,Type of Material,Anode,Size of Anode,Surface Area of Anode cm2,Inoculum Source/,Power Density mw/m2,Reference Number,Title,DOI,Year
0,Carbon-based,PANI-Sodium Alginate-Carbon Brush,,,Mixed Culture,520,18,Enhanced performance of microbial fuel cell wi...,10.1016/j.energy.2020.117780,2020
1,Carbon-based,FeS2-decorated Graphene,,,Mixed Culture,3222,31,FeS2 nanoparticles decorated graphene as micro...,10.1002/adma.201800618,2018
2,Carbon-based,3D N-doped carbon foam,,,Mixed Culture,4999,34,High power generation in mixed-culture microbi...,10.1016/j.cej.2020.125848,2020
3,Carbon-based,3D printed carbonaceous porous,,,S.Oneidensis,230,15,High performance of microbial fuel cell afford...,10.1016/j.electacta.2019.135243,2020


Now for another Review paper from Jalili 2024 (doi.org/10.1016/j.heliyon.2024.e25439)

In [None]:
def get_references_from_doi(doi):
    url = f"https://api.crossref.org/works/{doi}"
    response = requests.get(url)
    
    if response.status_code == 200:
        data = response.json()
        references = data.get('message', {}).get('reference', [])
        return references
    else:
        print(f"Error fetching data for DOI {doi}: {response.status_code}")
        return None

# Helper function to extract necessary information for the table
def extract_reference_info(ref, order):
    # Extract the title
    title = ref.get('article-title', 'No Title')
    
    # Reference order number
    reference_number = order + 1  # Start from 1
    
    # Extract DOI if available
    ref_doi = ref.get('DOI', 'No DOI')
    
    # Extract the year of publication
    year = ref.get('year', 'No Year')
    
    # Return a dictionary with only the desired columns
    return {
        "Title": title,
        "Reference Number": reference_number,
        "DOI": ref_doi,
        "Year": year
    }

# Example DOI
doi = "doi.org/10.1016/j.heliyon.2024.e25439"  # Replace with your DOI
references = get_references_from_doi(doi)

# Create a DataFrame from the extracted references information
if references:
    reference_data = [extract_reference_info(ref, i) for i, ref in enumerate(references)]
    df = pd.DataFrame(reference_data, columns=["Title", "Reference Number", "DOI", "Year"])
    
    # Display the table
    print(df)
    
    # Save to CSV
    output_csv_path = 'C:\\Users\\pedro\\Desktop\\Materials World\\Reference list Jalili 2024.csv'
    df.to_csv(output_csv_path, index=False)
    print(f"References table saved to {output_csv_path}")
else:
    print("No references found.")

                                                 Title  Reference Number  \
0    Analysis of ammonia loss mechanisms in microbi...                 1   
1      Towards a science of climate and energy choices                 2   
2    Global energy perspectives to 2060–WEC's world...                 3   
3    Renewable energy and sustainable development: ...                 4   
4    A novel microbial fuel cell stack for continuo...                 5   
..                                                 ...               ...   
186  Microbial phenazine production enhances electr...               187   
187  Metabolites produced by Pseudomonas sp. enable...               188   
188  Anodic biofilms in microbial fuel cells harbor...               189   
189  Microfluidic microbial fuel cell: on-chip auto...               190   
190  Sediment microbial fuel cells as a barrier to ...               191   

                                DOI  Year  
0                 10.1002/bit.21687  2008  

In [41]:
reference_list_jalili2024 = pd.read_csv("Reference list Jalili 2024.csv")
reference_list_jalili2024.head()

Unnamed: 0,Title,Reference Number,DOI,Year
0,Analysis of ammonia loss mechanisms in microbi...,1,10.1002/bit.21687,2008
1,Towards a science of climate and energy choices,2,10.1038/nclimate3027,2016
2,Global energy perspectives to 2060–WEC's world...,3,10.1016/j.esr.2020.100523,2020
3,Renewable energy and sustainable development: ...,4,10.1016/S1364-0321(99)00011-8,2000
4,A novel microbial fuel cell stack for continuo...,5,10.1016/j.ijhydene.2011.12.154,2012


In [42]:
table_info_jalili2024 = pd.read_csv("Table info_Jalili 2024.csv")
table_info_jalili2024.head()

Unnamed: 0,Type of Material,Anode,Size of Anode,Surface Area of Anode cm2,Inoculum Source/,Power Density mw/m2,Reference Number
0,Carbon-based,Carbon Cloth,,,Innoculum Source,28.0,10
1,Carbon-based,Graphite Rod and Carbon Cloth,,,Domestic wastewater,229.0,11
2,Carbon-based,Graphite Rod and Carbon Cloth,,,Domestic wastewater,1200.0,12
3,Carbon-based,Graphite Felt,,,Domestic wastewater,149.0,13
4,Carbon-based,Graphite Felt,,,Domestic wastewater,3.226,14


In [43]:
jalili2024_merged = pd.merge(table_info_jalili2024, reference_list_jalili2024, on="Reference Number")
jalili2024_merged.to_csv('Merged Jalili 2024.csv')

# Step 3: Concatenating all the merged data frames created from the multiple papers

In [34]:
he2021_merged.drop(columns=['Reference Number'],inplace=True)
he2021_merged.head(5)

Unnamed: 0,Type of Material,Anode,Size of Anode,Surface Area of Anode cm2,Inoculum Source/,Power Density mw/m2,Title,DOI,Year
0,Carbon-based,PANI-Sodium Alginate-Carbon Brush,,,Mixed Culture,520,Enhanced performance of microbial fuel cell wi...,10.1016/j.energy.2020.117780,2020
1,Carbon-based,FeS2-decorated Graphene,,,Mixed Culture,3222,FeS2 nanoparticles decorated graphene as micro...,10.1002/adma.201800618,2018
2,Carbon-based,3D N-doped carbon foam,,,Mixed Culture,4999,High power generation in mixed-culture microbi...,10.1016/j.cej.2020.125848,2020
3,Carbon-based,3D printed carbonaceous porous,,,S.Oneidensis,230,High performance of microbial fuel cell afford...,10.1016/j.electacta.2019.135243,2020


In [32]:
combined_actual.drop(columns=['duplicated'],inplace=True)
combined_actual.head()

Unnamed: 0,Type of Material,Anode,Size of Anode,Surface Area of Anode cm2,Inoculum Source/,Power Density mw/m2,Title,DOI,Year
0,Carbon-based,Carbon Cloth,,,Innoculum Source,28.0,Full-loop operation and cathodic acidification...,10.1016/j.biortech.2011.02.098,2011
1,Carbon-based,Graphite Rod and Carbon Cloth,,,Domestic waste,229.0,Power recovery with multi-anode/cathode microb...,10.1016/j.ijhydene.2010.04.136,2010
2,Carbon-based,Graphite Rod and Carbon Cloth,,,Domestic waste,1200.0,A pilot-scale study on utilizing multi-anode/c...,10.1016/j.ijhydene.2010.08.074,2011
3,Carbon-based,Graphite Felt,,,Domestic waste,149.0,Electricity generation and microbial community...,10.1016/j.biortech.2012.04.078,2012
4,Carbon-based,Graphite Felt,,,Domestic waste,3.226,Scalable microbial fuel cell (MFC) stack for c...,10.1016/j.biortech.2011.11.019,2012


In [37]:
merged_individual = pd.read_csv("Merged Individual Input 2024.csv")

In [45]:
merged_individual.drop(columns=['Unnamed: 0'],inplace=True)
merged_individual.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 9 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Type of Material           5 non-null      object 
 1   Anode                      5 non-null      object 
 2   Size of Anode              0 non-null      float64
 3   Surface Area of Anode cm2  4 non-null      float64
 4   Inoculum Source/           5 non-null      object 
 5   Power Density mw/m2        5 non-null      float64
 6   Title                      5 non-null      object 
 7   DOI                        5 non-null      object 
 8   Year                       5 non-null      int64  
dtypes: float64(3), int64(1), object(5)
memory usage: 488.0+ bytes


In [46]:
# Step 1: Concatenate the two or more DataFrames
combined_df = pd.concat([combined_actual,he2021_merged,merged_individual],ignore_index=True)




# Step 4: Removing duplicates (DOI)

In [47]:
combined_df = combined_df.drop_duplicates(subset='DOI', keep='first')

In [51]:
combined_df.head()

Unnamed: 0,Type of Material,Anode,Size of Anode,Surface Area of Anode cm2,Inoculum Source/,Power Density mw/m2,Title,DOI,Year
0,Carbon-based,Carbon Cloth,,,Innoculum Source,28.0,Full-loop operation and cathodic acidification...,10.1016/j.biortech.2011.02.098,2011
1,Carbon-based,Graphite Rod and Carbon Cloth,,,Domestic wastewater,229.0,Power recovery with multi-anode/cathode microb...,10.1016/j.ijhydene.2010.04.136,2010
2,Carbon-based,Graphite Rod and Carbon Cloth,,,Domestic wastewater,1200.0,A pilot-scale study on utilizing multi-anode/c...,10.1016/j.ijhydene.2010.08.074,2011
3,Carbon-based,Graphite Felt,,,Domestic wastewater,149.0,Electricity generation and microbial community...,10.1016/j.biortech.2012.04.078,2012
4,Carbon-based,Graphite Felt,,,Domestic wastewater,3.226,Scalable microbial fuel cell (MFC) stack for c...,10.1016/j.biortech.2011.11.019,2012


In [49]:
combined_df.to_csv('combined_version.csv', index=False)

You can concatenate more merged tables from other tables, but dont forget to drop the 'Reference Number' column after merging the reference list and the table info

After you concatenate a new merged table, use the drop_duplicates method again

In [7]:
combined_1_version_df=pd.read_csv('combined_1_version.csv')

In [8]:
combined_df_2 = pd.concat([combined_1_version_df, tcai2020_merged], ignore_index=True)
combined_df_2.head()

Unnamed: 0,Type of Material,Anode,Size of Anode,Surface Area of Anode cm2,Inoculum Source/,Power Density mw/m2,Title,DOI,Year,Reference Number
0,Carbon-based,Carbon Cloth,,,Innoculum Source,28.0,Full-loop operation and cathodic acidification...,10.1016/j.biortech.2011.02.098,2011,
1,Carbon-based,Graphite Rod and Carbon Cloth,,,Domestic wastewater,229.0,Power recovery with multi-anode/cathode microb...,10.1016/j.ijhydene.2010.04.136,2010,
2,Carbon-based,Graphite Rod and Carbon Cloth,,,Domestic wastewater,1200.0,A pilot-scale study on utilizing multi-anode/c...,10.1016/j.ijhydene.2010.08.074,2011,
3,Carbon-based,Graphite Felt,,,Domestic wastewater,149.0,Electricity generation and microbial community...,10.1016/j.biortech.2012.04.078,2012,
4,Carbon-based,Graphite Felt,,,Domestic wastewater,3.226,Scalable microbial fuel cell (MFC) stack for c...,10.1016/j.biortech.2011.11.019,2012,


In [None]:
combined_df = combined_df_2.drop('Reference Number', axis=1)

In [49]:
combined_df_2.head(10)

Unnamed: 0,Type of Material,Anode,Size of Anode,Surface Area of Anode cm2,Inoculum Source/,Power Density mw/m2,Title,DOI,Year,duplicated
0,Carbon-based,Carbon Cloth,,,Innoculum Source,28.0,Full-loop operation and cathodic acidification...,10.1016/j.biortech.2011.02.098,2011,False
1,Carbon-based,Graphite Rod and Carbon Cloth,,,Domestic wastewater,229.0,Power recovery with multi-anode/cathode microb...,10.1016/j.ijhydene.2010.04.136,2010,False
2,Carbon-based,Graphite Rod and Carbon Cloth,,,Domestic wastewater,1200.0,A pilot-scale study on utilizing multi-anode/c...,10.1016/j.ijhydene.2010.08.074,2011,False
3,Carbon-based,Graphite Felt,,,Domestic wastewater,149.0,Electricity generation and microbial community...,10.1016/j.biortech.2012.04.078,2012,False
4,Carbon-based,Graphite Felt,,,Domestic wastewater,3.226,Scalable microbial fuel cell (MFC) stack for c...,10.1016/j.biortech.2011.11.019,2012,False
5,Carbon-based,Graphite Felt,,,Swine wastewater,0.097,Long-term evaluation of a 10-liter serpentine-...,10.1016/j.biortech.2012.07.038,2012,False
6,Carbon-based,Carbon Brush,,,Malt wastewater,4.71,In situ investigation of tubular microbial fue...,10.1016/j.biortech.2013.02.107,2013,False
7,Carbon-based,Carbon Brush,,,Domestic wastewater,14.61,Long-term performance of liter-scale microbial...,10.1021/es400631r,2013,False
8,Carbon-based,Carbon Brush,,,Domestic wastewater,14.5,A horizontal plug flow and stackable pilot mic...,10.1016/j.biortech.2013.12.104,2014,False
9,Carbon-based,Carbon Brush,,,Domestic wastewater,159.0,A 90-liter stackable baffled microbial fuel ce...,10.1016/j.biortech.2015.06.026,2015,False


In [46]:
combined_df_2['duplicated'].value_counts()

False    93
True      8
Name: duplicated, dtype: int64

In [48]:
combined_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 102 entries, 0 to 101
Data columns (total 9 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Type of Material           102 non-null    object 
 1   Anode                      102 non-null    object 
 2   Size of Anode              41 non-null     object 
 3   Surface Area of Anode cm2  63 non-null     object 
 4   Inoculum Source/           102 non-null    object 
 5   Power Density mw/m2        102 non-null    float64
 6   Title                      102 non-null    object 
 7   DOI                        102 non-null    object 
 8   Year                       102 non-null    int64  
dtypes: float64(1), int64(1), object(7)
memory usage: 8.0+ KB
