# Non-blind Submission

In [1]:
import pandas as pd
import json
from openai import OpenAI
from datetime import datetime
import os
import glob

def get_prompt_for_property(property_name):
    """Get evaluation prompt for a specific property."""
    prompts = {
        'top5_desk': "In your capacity as an editorial board/reviewer for one of the most prestigious and highly selective top-5 economics journals (such as Econometrica, Journal of Political Economy, or The Quarterly Journal of Economics), please determine whether you would allow this submission to proceed past the desk review stage (0 = you would definitely reject at desk, 10 = you would definitely advance to peer review).",
        'top5_accept': "In your capacity as a reviewer for one of the most prestigious and highly selective top-5 economics journals (such as Econometrica, Journal of Political Economy, or The Quarterly Journal of Economics), please determine whether you would recommend this submission for publication (0 = you would definitely reject, 10 = you would definitely recommend for publication).",
        'top5_accept_rating': "In your capacity as a reviewer for one of the most prestigious and highly selective top-5 economics journals (such as Econometrica, Journal of Political Economy, or The Quarterly Journal of Economics), please determine whether you would recommend this submission for publication using the following 7-point scale: 1 = Definite Reject: Fatal flaws in theory/methodology, insufficient contribution, or serious validity concerns that make the paper unsuitable for the journal, 2 = Reject with Option to Resubmit: Significant issues with theory, methodology, or contribution, but potentially salvageable with major revisions and fresh review, 3 = Major Revision: Substantial changes needed to theory, empirics, or exposition, but the core contribution is promising enough to warrant another round, 4 = Minor Revision: Generally strong paper with few small changes needed in exposition, robustness checks, or literature discussion, 5 = Very Minor Revision: Excellent contribution needing only technical corrections or minor clarifications, 6 = Accept As Is: Exceptional contribution ready for immediate publication",
        'top5_accept_rating_criteria': "In your capacity as a reviewer for one of the most prestigious and highly selective top-5 economics journals (such as Econometrica, Journal of Political Economy, or The Quarterly Journal of Economics), please determine whether you would recommend this submission for publication using the following 7-point scale: 1 = Definite Reject: Fatal flaws in theory/methodology, insufficient contribution, or serious validity concerns that make the paper unsuitable for the journal, 2 = Reject with Option to Resubmit: Significant issues with theory, methodology, or contribution, but potentially salvageable with major revisions and fresh review, 3 = Major Revision: Substantial changes needed to theory, empirics, or exposition, but the core contribution is promising enough to warrant another round, 4 = Minor Revision: Generally strong paper with few small changes needed in exposition, robustness checks, or literature discussion, 5 = Very Minor Revision: Excellent contribution needing only technical corrections or minor clarifications, 6 = Accept As Is: Exceptional contribution ready for immediate publication; Papers published in the Top 5 economics journals (American Economic Review, Quarterly Journal of Economics, Journal of Political Economy, Econometrica, and Review of Economic Studies) are often distinguished from those in other journals by several key factors: 1. Depth of Contribution Originality and Innovation: Top 5 papers typically address questions of broad, foundational importance or propose groundbreaking methodologies. They often set new standards in the field or open new research avenues. Generalisability: Findings are relevant to a wide range of settings, not just niche contexts. Big Questions: These papers tackle issues with substantial implications for policy, theory, or practice. 2. Methodological Rigour High Standards of Empirical Methods: Empirical papers in Top 5 journals employ state-of-the-art econometric techniques and robust identification strategies (e.g., natural experiments, randomised controlled trials, structural modelling). Theoretical Sophistication: Theoretical contributions are mathematically rigorous and provide deep insights, often with broad applicability. Thorough Robustness Checks: Authors typically provide extensive sensitivity analyses to demonstrate the robustness of their results. 3. Writing and Presentation Quality Clarity and Structure: The narrative is compelling and accessible, even to non-specialists in the subfield, while maintaining academic precision. Polished Presentation: Papers are meticulously written, with clear figures, tables, and appendices. The results are easy to interpret and visually intuitive. Tight Argumentation: Papers avoid unnecessary digressions, focusing directly on the key question and results. 4. Data Quality Novelty of Data: Top 5 papers often leverage unique or hard-to-access datasets that enable the study of questions previously out of reach. Rigorous Cleaning and Documentation: The data handling and analysis process is highly transparent, with all steps carefully documented. 5. Relevance and Impact Policy Relevance: Many Top 5 papers have clear implications for public policy or major economic debates, making their findings influential beyond academia. Cross-Disciplinary Interest: These papers often resonate with researchers in related disciplines, such as political science, sociology, or psychology, enhancing their visibility and citation potential. Citations: Papers in Top 5 journals often become highly cited due to their broad applicability and significance. 6. Extensive Peer Review and Revisions Stringent Referee Process: Top 5 journals have rigorous review processes, often involving multiple rounds of detailed feedback and revisions. High Rejection Rates: Acceptance rates are extremely low (e.g., ~5%), ensuring only the most impactful papers are published. 7. Network Effects and Prestige Author Reputation: Papers by well-known authors or prestigious institutions are more likely to receive attention and scrutiny during the review process. Citations of Existing Literature: Top 5 papers typically build upon or challenge widely recognised works, further cementing their place in prominent scholarly conversations. Comparison with Other Journals Scope and Niche: Non-Top 5 journals may focus on narrower questions or less generalisable findings, which, while still valuable, may not have the same broad impact. Data Availability: Some journals may accept papers using less novel or standard datasets, provided the analysis is sound. Methodological Simplicity: Papers in lower-ranked journals may employ standard or less sophisticated methodologies, especially in empirical studies. Less Competitive Review Process: Non-Top 5 journals generally have higher acceptance rates and shorter review timelines, making them accessible to a broader range of researchers.",
        'grant': "As a reviewer for a major research funding organization, please evaluate whether this research proposal would be competitive for major funding (0 = definitely not fundable, 10 = definitely fundable at the highest award level).",
        'top_conference': "As a program committee member for prestigious economics conferences, please evaluate whether this work would be accepted for presentation (0 = definitely reject, 10 = definitely accept for prominent session).",
        'citation_impact': "Based on the novelty, methodology, and potential influence of this research, please project the actual number of citations this paper will receive in the next 10 years (output should be a specific predicted citation count)",
        'research_award': "As a committee member for major research awards, please evaluate whether this work could be competitive for prestigious recognition (0 = definitely not award-worthy, 10 = definitely award-worthy).",
        'nobel_potential': "As a member of the Nobel Prize Committee for Economic Sciences at the Royal Swedish Academy of Sciences, please provide a realistic evaluation of whether this research publication could contribute to winning the Nobel Prize in Economics (0 = Shows no indication of Nobel Prize potential,10 = Shows definitive Nobel Prize potential)",
        'tenure_eval': "As a senior member of a research university's tenure and promotion committee, please evaluate whether this research portfolio would support a strong case for tenure, considering both the quantity and quality of contributions (0 = definitely deny tenure, 10 = exceptionally strong case for tenure)."
        }
    return prompts.get(property_name)

def read_full_paper(paper_id):
    """Read the full paper text from file."""
    try:
        filepath = f"/work/input/paper_text/{paper_id:03d}.txt"
        with open(filepath, 'r') as file:
            return file.read()
    except Exception as e:
        print(f"Error reading paper {paper_id}: {e}")
        return ""

def evaluate_submission(submission_text, paper_id, include_full_text, client, property_name):
    """Evaluate a single submission for a specific property."""
    prompt = get_prompt_for_property(property_name)
    
    full_content = submission_text
    if include_full_text:
        full_paper = read_full_paper(paper_id)
        if full_paper:
            full_content += f"\nFull paper: {full_paper}"
    
    try:
        is_published_check = property_name == 'published'
        
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": prompt},
                {"role": "user", "content": full_content}
            ],
            response_format={
                "type": "json_schema",
                "json_schema": {
                    "name": "submission_evaluation",
                    "strict": True,
                    "schema": {
                        "type": "object",
                        "properties": {
                            "rating": {
                                "type": "number",
                                "description": f"Rating for {property_name}"
                            }
                        },
                        "required": ["rating"],
                        "additionalProperties": False
                    }
                }
            },
            temperature=1,
            max_tokens=2024,
            top_p=1
        )
        
        result = json.loads(response.choices[0].message.content)
        print (result.get('rating'), response.choices[0].message.content)
        print ('----------------------------------------------------------')
        return result.get('rating'), response.choices[0].message.content
        
    except Exception as e:
        print(f"\nError evaluating submission: {e}")
        return None, "Error"

def get_last_processed_index(output_dir, property_name):
    """Get the last processed index from temporary files."""
    temp_files = glob.glob(f"{output_dir}/{property_name}/{property_name}_temp_*.csv")
    if not temp_files:
        return -1
    
    latest_temp = max(temp_files, key=os.path.getctime)
    try:
        temp_df = pd.read_csv(latest_temp)
        # Find the last row where we have a complete evaluation for this property
        eval_columns = [col for col in temp_df.columns if col.startswith(f"{property_name}_") and col != f"{property_name}_mean"]
        last_completed = temp_df[eval_columns].notna().all(axis=1)
        if not last_completed.any():
            return -1
        return temp_df[last_completed].index[-1]
    except Exception as e:
        print(f"Error reading temp file: {e}")
        return -1

def process_submissions(input_file, property_name, base_output_dir="/work/output", include_full_text=False, evaluations_per_property=3):
    """Process submissions for a single property with continuation capability.
    
    Args:
        input_file (str): Path to the input CSV file
        property_name (str): Name of the property to evaluate
        base_output_dir (str): Base directory for all output files
        include_full_text (bool): Whether to include full paper text
        evaluations_per_property (int): Number of evaluations to perform per property
    """
    
    # Create output directories
    property_output_dir = f"{base_output_dir}/{property_name}"
    os.makedirs(property_output_dir, exist_ok=True)
    
    # Define main output file path
    main_output_file = f"{base_output_dir}/output_{property_name}.csv"
    
    # Check if main output file exists and load it instead of input file if it does
    if os.path.exists(main_output_file):
        print(f"Loading existing progress from {main_output_file}")
        df = pd.read_csv(main_output_file)
    else:
        print(f"Loading from input file {input_file}")
        df = pd.read_csv(input_file)
    
    # Initialize OpenAI client
    client = OpenAI()
    
    # Check for previous progress and load existing data
    temp_files = glob.glob(f"{property_output_dir}/{property_name}_temp_*.csv")
    if temp_files:
        latest_temp = max(temp_files, key=os.path.getctime)
        print(f"Found previous run data in {latest_temp}, attempting to load...")
        try:
            temp_df = pd.read_csv(latest_temp)
            # Get evaluation columns from the temporary file
            eval_columns = [f"{property_name}_{i+1}" for i in range(evaluations_per_property)]
            existing_eval_columns = [col for col in eval_columns if col in temp_df.columns]
            
            # Copy existing evaluations to the current dataframe
            for col in existing_eval_columns:
                df[col] = temp_df[col]
            
            # Find the last fully evaluated row
            last_processed_idx = get_last_processed_index(base_output_dir, property_name)
            print(f"Resuming from index {last_processed_idx + 1}")
        except Exception as e:
            print(f"Error loading previous data: {e}")
            last_processed_idx = -1
    else:
        last_processed_idx = -1
        
    start_idx = last_processed_idx + 1 if last_processed_idx >= 0 else 0
    
    # Initialize any missing evaluation columns
    eval_columns = [f"{property_name}_{i+1}" for i in range(evaluations_per_property)]
    for col in eval_columns:
        if col not in df.columns:
            df[col] = None
    
    temp_file_counter = max([int(f.split('_')[-1].split('.')[0]) for f in glob.glob(f"{property_output_dir}/{property_name}_temp_*.csv")] + [-1]) + 1
    
    # Process submissions
    total_tasks = (len(df) - start_idx) * evaluations_per_property
    current_task = 0
    
    # Convert start_idx to int to ensure proper indexing
    start_idx = int(start_idx)
    
    for idx in range(start_idx, len(df)):
        row = df.iloc[idx]
        
        for n in range(evaluations_per_property):
            current_task += 1
            column_name = f"{property_name}_{n+1}"
            
            # Skip if already processed
            if pd.notna(df.at[idx, column_name]):
                continue
            
            score, model_response = evaluate_submission(
                row['Submission'],
                row['Paper_id'],
                include_full_text,
                client,
                property_name
            )
            
            df.at[idx, column_name] = score
            
            # Print progress in one line
            progress = (current_task / total_tasks) * 100
            print(f"\rProgress: {progress:.1f}% | Submission_id: {row['Submission_id']} | Paper_id: {row['Paper_id']} | Score: {score}", end='')
            
            # Save temporary file and update main output every 10 predictions
            if current_task % 10 == 0:
                try:
                    # Calculate mean for this property
                    eval_columns = [f"{property_name}_{i+1}" for i in range(evaluations_per_property)]
                    # Convert columns to numeric, forcing errors to NaN
                    for col in eval_columns:
                        df[col] = pd.to_numeric(df[col], errors='coerce')
                    # Calculate mean, handling NaN values
                    df[f"{property_name}_mean"] = df[eval_columns].astype(float).mean(axis=1).round(2)
                    
                    # Ensure directory exists again (in case it was deleted)
                    os.makedirs(property_output_dir, exist_ok=True)
                    
                    # Save temporary file
                    temp_filename = f"{property_output_dir}/{property_name}_temp_{temp_file_counter}.csv"
                    print(f"\nAttempting to save temp file to: {temp_filename}")
                    df.to_csv(temp_filename, index=False)
                    
                    # Update main output file
                    main_output_file = f"{base_output_dir}/output_{property_name}.csv"
                    print(f"Attempting to save main output to: {main_output_file}")
                    df.to_csv(main_output_file, index=False)
                    
                    print(f"Successfully saved files")
                    temp_file_counter += 1
                except Exception as e:
                    print(f"\nError saving files: {e}")
                    print(f"Current directory structure:")
                    print(f"Base output dir exists: {os.path.exists(base_output_dir)}")
                    print(f"Property output dir exists: {os.path.exists(property_output_dir)}")
                    print(f"Current working directory: {os.getcwd()}")
                    print("\nColumn types:")
                    for col in eval_columns:
                        print(f"{col}: {df[col].dtype}")
    
    # Calculate mean for property
    df[f"{property_name}_mean"] = df[eval_columns].mean(axis=1).round(2)
    
    # Save final results
    final_output_file = f"{base_output_dir}/output_{property_name}.csv"
    final_output_file_backup = f"{base_output_dir}/output_{property_name}_final.csv"
    df.to_csv(final_output_file, index=False)
    df.to_csv(final_output_file_backup, index=False)
    
    print(f"\n\nFinal results saved to {final_output_file} and {final_output_file_backup}")
    return df

# Example usage:
# process_submissions(
#     input_file='input.csv',
#     property_name='top5_accept_rating',
#     base_output_dir='/work/output',  # Explicitly set output directory
#     include_full_text=False,
#     evaluations_per_property=3
# )

In [2]:
Non_blind_submission_df = generate_submissions(
    input_filepath="/work/input/full_30_input_journal_paper.csv",
    n_papers_per_journal=3,  # Sample 2 papers per journal
    n_top=10,
    n_bottom=10,
    n_random=10,
    n_institutions=10
)

Dataset Size Calculation:
Number of journals: 10
Papers per journal: 3
Total papers sampled: 30
Number of names: 30 (10 top + 10 bottom + 10 random)
Number of institutions: 10

Total rows = 30 papers × 30 names × 10 institutions
            = 30 × 30 × 10
            = 9000

Actual number of rows: 9000

Distribution of Name Categories:
Name_Category
Top       3000
Bottom    3000
Random    3000
Name: count, dtype: int64

Distribution of Institutions:
Institution
Massachusetts Institute of Technology;               900
Harvard University;                                  900
University of Warwick;                               900
London School of Economics and Political Science;    900
University of Tokyo;                                 900
University of Cape Town;                             900
Nanyang Technological University;                    900
Chulalongkorn University;                            900
Universiti Malaya;                                   900
None                 

In [3]:
Non_blind_submission_df

Unnamed: 0,Paper_id,Submission,Original_Publication,Name_Category,Institution
0,1,A submission with the following details: Title...,Journal of Political Economy,Top,Massachusetts Institute of Technology;
1,1,A submission with the following details: Title...,Journal of Political Economy,Top,Harvard University;
2,1,A submission with the following details: Title...,Journal of Political Economy,Top,University of Warwick;
3,1,A submission with the following details: Title...,Journal of Political Economy,Top,London School of Economics and Political Science;
4,1,A submission with the following details: Title...,Journal of Political Economy,Top,University of Tokyo;
...,...,...,...,...,...
8995,29,A submission with the following details: Title...,GPT-o1,Random,University of Cape Town;
8996,29,A submission with the following details: Title...,GPT-o1,Random,Nanyang Technological University;
8997,29,A submission with the following details: Title...,GPT-o1,Random,Chulalongkorn University;
8998,29,A submission with the following details: Title...,GPT-o1,Random,Universiti Malaya;


# Blind Submission

In [4]:
import pandas as pd

def generate_blind_submissions(input_filepath, n_papers_per_journal=1):
    """
    Generate submissions without authors and affiliations.
    
    Parameters:
    -----------
    input_filepath : str
        Path to the input CSV file containing papers
    n_papers_per_journal : int, optional (default=1)
        Number of papers to sample from each journal
        
    Returns:
    --------
    pandas.DataFrame
        DataFrame containing papers with 'None' for authors and institutions
    """
    
    # Load the journal paper dataset
    df_journalpaper = pd.read_csv(input_filepath)
    
    # Sample papers from each journal
    sampled_papers = []
    for journal in df_journalpaper['Journal'].unique():
        journal_papers = df_journalpaper[df_journalpaper['Journal'] == journal].sample(
            n=min(n_papers_per_journal, len(df_journalpaper[df_journalpaper['Journal'] == journal])),
            replace=False
        )
        sampled_papers.append(journal_papers)
    
    # Combine sampled papers
    df_journalpaper = pd.concat(sampled_papers, ignore_index=True)

    # Create empty lists to store submissions
    submissions = []
    paper_ids = []
    original_publications = []

    # Generate submissions without authors and affiliations
    for _, paper in df_journalpaper.iterrows():
        # Create the submission text
        submission_text = (
            f"A submission with the following details: "
            f"Title: {paper['Title']}; "
            f"Author: None; "
            f"Affiliation: None; "
            f"Abstract: {paper['Abstract']};"
        )
        
        # Append information
        submissions.append(submission_text)
        paper_ids.append(paper['Paper #'])
        original_publications.append(paper['Journal'])

    # Create the final dataframe
    submission_df = pd.DataFrame({
        'Paper_id': paper_ids,
        'Submission': submissions,
        'Original_Publication': original_publications,
        'Name_Category': 'None',
        'Institution': 'None'
    })

    # Reset index
    submission_df = submission_df.reset_index(drop=True)

    # Calculate and print statistics
    n_journals = len(df_journalpaper['Journal'].unique())
    n_papers = len(df_journalpaper)

    print(f"Dataset Size Calculation:")
    print(f"Number of journals: {n_journals}")
    print(f"Papers per journal: {n_papers_per_journal}")
    print(f"Total papers sampled: {n_papers}")
    
    print("\nDistribution of Papers per Journal:")
    print(submission_df['Original_Publication'].value_counts())

    # Display a sample submission
    print("\nSample submission format:")
    print(submission_df['Submission'].iloc[0])

    return submission_df

In [5]:
Blind_submission_df = generate_blind_submissions(
    input_filepath="/work/input/full_30_input_journal_paper.csv",
    n_papers_per_journal=3,  # Sample 2 papers per journal
)

Dataset Size Calculation:
Number of journals: 10
Papers per journal: 3
Total papers sampled: 30

Distribution of Papers per Journal:
Original_Publication
Journal of Political Economy                   3
Econometrica                                   3
The Quarterly Journal of Economics             3
Economica                                      3
Oxford Bulletin of Economics and Statistics    3
European Economic Review                       3
Asian Economic and Financial Review (AEFR)     3
Business and Economics Journal                 3
Journal of Applied Economics and Business      3
GPT-o1                                         3
Name: count, dtype: int64

Sample submission format:
A submission with the following details: Title: Endogenous Liquidity and Capital Reallocation; Author: None; Affiliation: None; Abstract: This paper studies economies where firms acquire capital in primary markets and then, after idiosyncratic productivity shocks, retrade it in secondary markets that i

In [6]:
Blind_submission_df

Unnamed: 0,Paper_id,Submission,Original_Publication,Name_Category,Institution
0,1,A submission with the following details: Title...,Journal of Political Economy,,
1,2,A submission with the following details: Title...,Journal of Political Economy,,
2,3,A submission with the following details: Title...,Journal of Political Economy,,
3,6,A submission with the following details: Title...,Econometrica,,
4,4,A submission with the following details: Title...,Econometrica,,
5,5,A submission with the following details: Title...,Econometrica,,
6,9,A submission with the following details: Title...,The Quarterly Journal of Economics,,
7,8,A submission with the following details: Title...,The Quarterly Journal of Economics,,
8,7,A submission with the following details: Title...,The Quarterly Journal of Economics,,
9,11,A submission with the following details: Title...,Economica,,


In [7]:
combined_df = pd.concat([Non_blind_submission_df, Blind_submission_df], axis=0, ignore_index=True)

In [8]:
combined_df

Unnamed: 0,Paper_id,Submission,Original_Publication,Name_Category,Institution
0,1,A submission with the following details: Title...,Journal of Political Economy,Top,Massachusetts Institute of Technology;
1,1,A submission with the following details: Title...,Journal of Political Economy,Top,Harvard University;
2,1,A submission with the following details: Title...,Journal of Political Economy,Top,University of Warwick;
3,1,A submission with the following details: Title...,Journal of Political Economy,Top,London School of Economics and Political Science;
4,1,A submission with the following details: Title...,Journal of Political Economy,Top,University of Tokyo;
...,...,...,...,...,...
9025,25,A submission with the following details: Title...,Journal of Applied Economics and Business,,
9026,26,A submission with the following details: Title...,Journal of Applied Economics and Business,,
9027,30,A submission with the following details: Title...,GPT-o1,,
9028,29,A submission with the following details: Title...,GPT-o1,,


In [9]:
# Start from 1 instead of 0
combined_df.insert(0, 'Submission_id', [f'S_{str(i).zfill(6)}' for i in range(1, len(combined_df) + 1)])

In [10]:
combined_df

Unnamed: 0,Submission_id,Paper_id,Submission,Original_Publication,Name_Category,Institution
0,S_000001,1,A submission with the following details: Title...,Journal of Political Economy,Top,Massachusetts Institute of Technology;
1,S_000002,1,A submission with the following details: Title...,Journal of Political Economy,Top,Harvard University;
2,S_000003,1,A submission with the following details: Title...,Journal of Political Economy,Top,University of Warwick;
3,S_000004,1,A submission with the following details: Title...,Journal of Political Economy,Top,London School of Economics and Political Science;
4,S_000005,1,A submission with the following details: Title...,Journal of Political Economy,Top,University of Tokyo;
...,...,...,...,...,...,...
9025,S_009026,25,A submission with the following details: Title...,Journal of Applied Economics and Business,,
9026,S_009027,26,A submission with the following details: Title...,Journal of Applied Economics and Business,,
9027,S_009028,30,A submission with the following details: Title...,GPT-o1,,
9028,S_009029,29,A submission with the following details: Title...,GPT-o1,,


In [11]:
# Instead of showing all data at once, let's break it down:

# 1. Basic shape and info
print("DataFrame Shape:", combined_df.shape)
print("\nDataFrame Data Types:")
print(combined_df.dtypes)

# 2. Basic statistics with limited rows
print("\nBasic Statistics (first 10 rows):")
print(combined_df.describe().head(10))

# 3. Sample of value counts (top 10 for each column)
for column in combined_df.columns:
    print(f"\nTop 10 value counts for {column}:")
    print(combined_df[column].value_counts().head(10))

DataFrame Shape: (9030, 6)

DataFrame Data Types:
Submission_id           object
Paper_id                 int64
Submission              object
Original_Publication    object
Name_Category           object
Institution             object
dtype: object

Basic Statistics (first 10 rows):
          Paper_id
count  9030.000000
mean     15.500000
std       8.655921
min       1.000000
25%       8.000000
50%      15.500000
75%      23.000000
max      30.000000

Top 10 value counts for Submission_id:
Submission_id
S_009030    1
S_000001    1
S_000002    1
S_000003    1
S_000004    1
S_000005    1
S_008991    1
S_008992    1
S_008993    1
S_008994    1
Name: count, dtype: int64

Top 10 value counts for Paper_id:
Paper_id
1     301
2     301
3     301
5     301
4     301
6     301
9     301
8     301
7     301
12    301
Name: count, dtype: int64

Top 10 value counts for Submission:
Submission
A submission with the following details: Title: The Impact of Digitalization on Labor Markets: Evidence fr

In [12]:
combined_df.to_csv("/work/process/full_30_submission.csv", index=False)

In [13]:
combined_df.head(10).to_csv("/work/process/full_30_submission_test.csv", index=False)

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=a25c250f-64bb-477e-a263-2c8cc56f7dca' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>