# <b> (Phase 2) Using the ERIC API and Python for a Systematic Review of P-20 Tracking: Searching</b>
---
This Jupyter Notebook demonstrates how to connect to and use the ERIC API to collect literature and research topics from <a href="https://eric.ed.gov/">eric.ed.gov</a>. This is a modified script of the public ERIC jupyter notebook that uses Python and Pandas.
<ul>
<li><b>Source:</b> <a href="https://eric.ed.gov/pdf/Using_ERIC_API_for_Research_Topics.pdf">ERIC API Jupyter Notebook</a></li>
<li><b>Python Distribution:</b> <a href="https://www.anaconda.com/download">Anaconda Download</a></li>
<li><b>Jupyter Notebook Install:</b> <a href="https://docs.anaconda.com/ae-notebooks/user-guide/basic-tasks/apps/jupyter/index.html">Jupyter Notebook (Anaconda)</a></li>   
</ul>

<b>Project:</b> Systematic Literature Review of Tracking in P-20 Education<br>
<b>Authors:</b> Amy Stich, Sean Baser, Collin Case, Hunter Jones, Ananya Malik, Carlie Cooper, & Jolssen Rodriguez<br>
<b>Questions about the script:</b> Sean Baser (<sbaser@sheeo.org>)<br>

<b><ins>Tips for Searching the <a href="https://eric.ed.gov/?faq-searching_api">ERIC API</a></ins></b>
<ul>
  <li>The default between search terms in the API is OR, not AND as in the ERIC website. Therefore, you need to insert AND when you want to combine search terms.</li>
  <li>You do not need to add a * to search terms to retrieve variations of the search word—the API automatically stems terms. For example, if you search the word “read,” the API will also pick up the words “reads,” “readers,” and “reading.” 
    <ul>
      <li><i>Comment: I have not found this to be true. Using a wildcard provided more results than just the stem.</li>
      <li>Example:
        <ul>
          <li>getRecordCount('(pathway)') | Search (pathway) returned 4,398 records</li>
          <li>getRecordCount('(pathways)') | Search (pathways) returned 10,595 records</li>
          <li>getRecordCount('(pathway*)') | Search (pathway*) returned 13,954 records</li>
        </ul>
      </i></li>
    </ul>
  </li>
  <li>You must use straight quote marks (for example: "word") and not smart or curly quote marks (for example: “word”).</li>
  <li>Some of the field names in the API differ from the field names used in ERIC website searches; for example, the API uses the term subject while the website uses the term descriptor. There is a table of field names on the API landing page.</li>
</ul>



---
## <ins>Connecting to the ERIC API<ins>

### Import required Python libraries
We start by importing the Python libraries that could be used in the notebook. (NEEDS EDITS; NOT ALL ARE NEEDED)

In [1]:
# Importing necessary libraries
import numpy as np  # NumPy for numerical operations
import pandas as pd  # Pandas for data manipulation and analysis
import requests  # Requests for making HTTP requests
import json  # JSON for handling JSON data
import time  # Time for time-related functions
import matplotlib.pyplot as plt  # Matplotlib for plotting
import seaborn as sns  # Seaborn for statistical data visualization
import nltk # NLTK for natural language processing tasks such as tokenization, stemming, and lemmatization
from io import BytesIO # Import BytesIO to handle binary data in memory
from datetime import datetime
import os
import shutil
import win32com.client as win32
from openpyxl import load_workbook


# Setting the Seaborn style for plots
sns.set()

### Next we create a function to make ERIC API requests
The following function (getEricRecords) calls the ERIC API when invoked. It takes five parameters:
<ul> 
<li> <b>Search:</b> The search criteria. See the ERIC API docs (<a href= "https://eric.ed.gov/?advanced">ERIC Advanced Search</a>) for details.</li>
<li> <b>Format:</b> The response format. Valid values are xml (default), json, and csv.</li>
<li> <b>Start:</b> This parameter supports pagination by specifying the starting record index for the returned record set. The default is 0.</li>
<li> <b>Rows:</b> By default, the ERIC API returns 20 records at a time. This parameter can be set to a value between 20 and 200.</li>
<li> <b>Fields:</b> Specifies the fields to include in the returned records. See the ERIC API docs (<a href= "https://eric.ed.gov/pdf/ERIC_Field.pdf">ERIC Field Guide</a>) for details.</li>
</ul>

In [2]:
# Define a function to get records from the ERIC database
def getEricRecords(search, fields=None, start=0, rows=200):
    url = 'https://api.ies.ed.gov/eric/?'
    url = url + 'search=' + search + '&rows=' + str(rows) + '&format=json&start=' + str(start)
    if fields:
        url = url + '&fields=' + ', '.join(fields)
    response = requests.get(url)  
    
    # Only print error messages if the response status is not OK
    if response.status_code != 200:
        print(f"Error with status code: {response.status_code}")
        return None  # Return None to indicate an error occurred
    return response.json()
   
    # Debugging: Print the response status code and a snippet of the JSON structure
    #print(f"Response status code: {response.status_code}")
    #responseJson = response.json()
    #print(list(responseJson.keys()))  # Inspect the keys in the response JSON

    #return responseJson

### Get Number of Records for Search
This utility function returns the number of records returned by the ERIC API for a given search.

In [3]:
# Define a function to get the total record count for a given search term
def getRecordCount(search):
    responseJson = getEricRecords(search)
    
    # Check if 'response' and 'numFound' are in the responseJson
    if 'response' in responseJson and 'numFound' in responseJson['response']:
        totalRecords = responseJson['response']['numFound']
    else:
        # Handle the case where 'numFound' is missing
        print("Error: 'numFound' not found in the response.")
        totalRecords = 0
    
    print('Search', search, 'returned', format(totalRecords), 'records')
    return totalRecords

### Create function to get all records for an ERIC API search
The following function getAllEricRecords calls the getEricRecords function as many times as needed to fetch all records in an ERIC API Search. If a set of return fieids are not specified, frequently used fields will be returned. The cleanElementsusingList function cleans the resulting dataframe from showing empty lists in some fields where the ERIC API returns an empty list.

In [4]:
# Define a function to clean elements in a list
def cleanElementsUsingList(x):
    if not isinstance(x, list):
        return x
    if not x or (len(x) == 1 and x[0] == ''):
        return None
    return ', '.join(x)

# Function to get all records from the ERIC database for a given search term
def getAllEricRecords(search, fields=None, cleanElements=True):
    startTime = time.time()
    nextFirstRecord = 0
    numRecordsReturnedEachApiCall = 200
    totalRecords = getRecordCount(search)
    if totalRecords == 0:
        print('Search', search, 'has no results')
        return pd.DataFrame()  # Return an empty DataFrame

    all_records = []
    while nextFirstRecord < totalRecords:
        responseJson = getEricRecords(search, fields, nextFirstRecord)
        if 'response' in responseJson and 'docs' in responseJson['response']:
            docs = responseJson['response']['docs']
            all_records.extend(docs)
        else:
            print("Error: 'docs' not found in the response.")
            return pd.DataFrame()  # Return an empty DataFrame if 'docs' not found
        nextFirstRecord += numRecordsReturnedEachApiCall
    
    print('took', '{:,.1f}'.format(time.time() - startTime), 'seconds')
    
    records = pd.DataFrame(all_records)
    if cleanElements:
        for column in records.columns:
            records[column] = records[column].map(cleanElementsUsingList)
    return records

## <ins>ERIC Fields</ins>
Below are the fields that are available in the ERIC API. We establish the variable allEricFields in order to download all 34 fields rather than the frequently used list. These fields are also what is used to search and filter the database.

In [5]:
# List of all possible fields in the ERIC API response
allEricFields = [
    'record',  # Record (newly added field if required)
    'id',  # ERIC Number
    'title',  # Title
    'author',  # Author
    'source',  # Source
    'publicationdateyear',  # Publication Date
    'description',  # Abstract
    'subject',  # Descriptors
    'peerreviewed',  # Peer Reviewed
    'abstractor',  # Abstractor
    'audience',  # Audience
    'authorxlink',  # Author ID URL
    'e_datemodified',  # Date Modified
    'e_fulltextauth',  # Full-Text Download
    'e_yearadded',  # Year Added
    'educationlevel',  # Education Level
    'identifiersgeo',  # Identifiers - Location
    'identifierslaw',  # Identifiers - Laws, Policies, & Programs
    'identifierstest',  # Identifiers - Assessments and Surveys
    'iescited',  # IES Cited
    'iesfunded',  # IES Funded
    'iesgrantcontractnum',  # Grant or Contract Numbers
    'iesgrantcontractnumxlink',  # Grant or Contract Numbers link
    'ieslinkdatasource',  # Data File: URL
    'ieslinkpublication',  # IES Publication
    'ieslinkwwcreviewguide',  # WWC Study Page
    'ieswwcreviewed',  # What Works Clearinghouse Reviewed
    'institution',  # Authorizing Institution
    'isbn',  # ISBN
    'issn',  # ISSN
    'language',  # Language
    'publicationtype',  # Publication Type
    'publisher',  # Publisher Information
    'sourceid',  # Citation
    'sponsor',  # Sponsor
    'url',  # Direct Link
    'source_search' #Source search of ERIC API; Not the API
]

# Print the number of possible fields in the ERIC API response
print('There are', len(allEricFields), 'possible fields in the ERIC API response')

There are 37 possible fields in the ERIC API response


---
## <ins>Searching the API for P-20 Tracking Articles</ins>

### Search 1: Common ERIC Thesaurus Subject Terms for Tracking
<ins>Search Parameters</ins>
<ul>
    <li><b>Subject</b>: "Track System (Education)" OR "Ability Grouping" OR "Student Placement"</li>
    <li><b>Standard Parameters</b>: publicationtype:"Journal Articles" AND peerreviewed:"T" AND -subject:"Foreign Countries"</li>
</ul>

In [6]:
# First search query
search_1 = '(subject:"Track System (Education)" OR subject:"Ability Grouping" OR subject:"Student Placement") ' \
           'AND (publicationtype:"Journal Articles" AND peerreviewed:"T" AND -subject:"Foreign Countries")'
records_1 = getAllEricRecords(search_1, allEricFields)  # Remove the debug argument

# Ensure records_1 is a DataFrame
if isinstance(records_1, pd.DataFrame) and not records_1.empty:
    # Save records to a master file
    master_records = records_1

    # Add 'source_search' column with a value of 1 for all records from this search
    master_records['source_search'] = "Search 1"

    # Check if 'publicationdateyear' column exists and handle missing data
    if 'publicationdateyear' in master_records.columns:
        master_records['publicationdateyear'] = pd.to_numeric(master_records['publicationdateyear'], errors='coerce')
        master_records = master_records.sort_values(by='publicationdateyear')
    else:
        print("'publicationdateyear' column is missing in the records")

    # Define the desired column order, including the new 'source_search'
    desired_order = [
        'id', 'source_search', 'title', 'author', 'publicationdateyear', 'description', 'subject', 
        'source', 'publicationtype', 'peerreviewed',        
    ]

    # Reorder the columns in the DataFrame if all desired columns are present
    missing_columns = [col for col in desired_order if col not in master_records.columns]
    if missing_columns:
        print(f"Warning: The following columns are missing and will not be included: {missing_columns}")
        desired_order = [col for col in desired_order if col in master_records.columns]
    
    master_records = master_records[desired_order + [col for col in master_records.columns if col not in desired_order]]

    # Display the master records
    display(master_records)  # Use display() for Jupyter notebook
else:
    print("Error: records_1 is not a DataFrame or is empty.")

Search (subject:"Track System (Education)" OR subject:"Ability Grouping" OR subject:"Student Placement") AND (publicationtype:"Journal Articles" AND peerreviewed:"T" AND -subject:"Foreign Countries") returned 3292 records
took 10.1 seconds


Unnamed: 0,id,source_search,title,author,publicationdateyear,description,subject,source,publicationtype,peerreviewed,...,identifierstest,sponsor,iesgrantcontractnum,iesgrantcontractnumxlink,ieslinkdatasource,institution,iesfunded,ieswwcreviewed,ieslinkwwcreviewguide,isbn
959,EJ061632,Search 1,"Use of a ""Balance-Sheet"" Procedure to Improve ...","Mann, Leon",1972,This study tested the effectiveness of a tally...,"Academic Aspiration, College Bound Students, C...",Journal of Vocational Behavior,Journal Articles,T,...,,,,,,,,,,
1176,EJ198756,Search 1,Response to Sokolow.,"Heslep, Robert D.",1978,Clarification is made of a stand taken on the ...,"Access to Education, Admission Criteria, Equal...",Educational Theory,Journal Articles,T,...,,,,,,,,,,
1706,EJ196380,Search 1,Fieldwork in Industrial Settings: Opportunitie...,"Akabas, Sheila H.",1978,The rationale for industrial social welfare is...,"Business, Experiential Learning, Field Experie...",Journal of Education for Social Work,Journal Articles,T,...,,,,,,,,,,
1260,EJ196127,Search 1,The Principal and Special Education Placement.,"Yoshida, Roland K., And Others",1978,Uses the placement practices of one school as ...,"Due Process, Elementary Education, Federal Leg...",National Elementary Principal,Journal Articles,T,...,,,,,,,,,,
1990,EJ198868,Search 1,The Comparative Validity of the California Sta...,"Michael, William B., Shaffer, Phyllis",1978,The California State University and Colleges E...,"College Freshmen, English Education, Equivalen...",Educational and Psychological Measurement,"Journal Articles, Reports - Research",T,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3005,EJ1422440,Search 1,The Interplay of Glossing with Text Difficulty...,"Seyede Faezeh Hosseini Alast, Sasan Baleghizadeh",2024,The aim of this experiment was to investigate ...,"Reading Comprehension, Difficulty Level, Readi...",Language Teaching Research,"Journal Articles, Reports - Research, Tests/Qu...",T,...,,,,,,,,,,
154,EJ1426902,Search 1,Plessy&apos;s Tracks: African American Student...,Richard Lofton,2024,Why are most African American students in lowe...,"African American Students, Middle School Stude...","Race, Ethnicity and Education","Journal Articles, Reports - Research",T,...,,,,,,,,,,
978,EJ1417685,Search 1,How Academics Can Play a More Influential Role...,"William E. Donald, Helen P. N. Hughes",2024,Universities worldwide are tasked with produci...,"College Curriculum, Curriculum Design, Employm...",Industry and Higher Education,"Journal Articles, Reports - Descriptive",T,...,,,,,,,,,,
249,EJ1426302,Search 1,Tracking the Effects: Examining the Opportunit...,"Kristian Edosomwan, Jemimah L. Young, Bettie R...",2024,The relationship between academic tracking and...,"Discipline, College Preparation, Educational O...",Journal of Education,"Journal Articles, Reports - Research",T,...,Education Longitudinal Study of 2002 (NCES),,,,,,,,,


### Search 2: Pathway* and Pipeline*
This search captures all uses of pathways or pipeline when combined with specific education levels.
<ins>Search Parameters</ins>
<ul>
    <li><b>Key Words</b>: "pathway*" AND "pipeline*"
    <li><b>Education Level (educationlevel)</b>: "Preschool Education" OR "Kindergarten" OR "Grade 1" OR "Grade 2" OR "Grade 3" OR "Grade 4" OR "Grade 5" OR "Grade 6" OR "Grade 7" OR "Grade 8" OR "Grade 9" OR "Grade 10" OR "Grade 11" OR "Grade 12" OR "Junior High Schools" OR "Middle Schools" OR "High Schools" OR "Vocational High Schools" OR "Vocational Schools" OR "Elementary Education" OR "Elementary School Students" OR "Elementary Secondary Education" OR "Intermediate Grades" OR "Primary Education" OR "Secondary Education" OR "Adult Education" OR "Two Year Colleges" OR "Community Colleges" OR "Colleges" OR "Postsecondary Education" OR "Higher Education" OR "Graduate Study" OR "Early Childhood Education"</li>    
    <li><b>Standard Parameters</b>: publicationtype:"Journal Articles" AND peerreviewed:"T" AND -subject:"Foreign Countries"</li>
</ul>

In [7]:
# Updated second search query with multi-word terms combined with AND
search_2 = '(pathway* OR pipeline*) AND ' \
           '(educationlevel:"Preschool Education" OR educationlevel:"Kindergarten" OR educationlevel:"Grade 1" OR educationlevel:"Grade 2" OR educationlevel:"Grade 3" OR educationlevel:"Grade 4" OR educationlevel:"Grade 5" OR educationlevel:"Grade 6" OR educationlevel:"Grade 7" OR educationlevel:"Grade 8" OR educationlevel:"Grade 9" OR educationlevel:"Grade 10" OR educationlevel:"Grade 11" OR educationlevel:"Grade 12" OR educationlevel:"Junior High Schools" OR educationlevel:"Middle Schools" OR educationlevel:"High Schools" OR educationlevel:"Vocational High Schools" OR educationlevel:"Vocational Schools" OR educationlevel:"Elementary Education" OR educationlevel:"Elementary School Students" OR educationlevel:"Elementary Secondary Education" OR educationlevel:"Intermediate Grades" OR educationlevel:"Primary Education" OR educationlevel:"Secondary Education" OR educationlevel:"Adult Education" OR educationlevel:"Two Year Colleges" OR educationlevel:"Community Colleges" OR educationlevel:"Colleges" OR educationlevel:"Postsecondary Education" OR educationlevel:"Higher Education" OR educationlevel:"Graduate Study" OR educationlevel:"Early Childhood Education") AND ' \
           '(publicationtype:"Journal Articles" AND peerreviewed:T AND -subject:"Foreign Countries")'

records_2 = getAllEricRecords(search_2, allEricFields)  # Remove the debug argument

# Ensure records_2 is a DataFrame
if isinstance(records_2, pd.DataFrame) and not records_2.empty:
    # Add source_search column
    records_2['source_search'] = 'Search 2'
    
    # Check for duplicates between the new records and the master file
    duplicates = master_records['id'].isin(records_2['id'])

    # Print the number of duplicate records found
    print(f"Number of duplicate records found: {duplicates.sum()}")

    # Append new records to the master file and remove duplicates
    master_records = pd.concat([master_records, records_2]).drop_duplicates(subset='id')

    # Check if 'publicationdateyear' column exists and handle missing data
    if 'publicationdateyear' in master_records.columns:
        master_records['publicationdateyear'] = pd.to_numeric(master_records['publicationdateyear'], errors='coerce')
        master_records = master_records.sort_values(by='publicationdateyear')
    else:
        print("'publicationdateyear' column is missing in the records")

    # Reorder the columns in the DataFrame if all desired columns are present
    missing_columns = [col for col in desired_order if col not in master_records.columns]
    if missing_columns:
        print(f"Warning: The following columns are missing and will not be included: {missing_columns}")
        desired_order = [col for col in desired_order if col in master_records.columns]
    
    master_records = master_records[desired_order + [col for col in master_records.columns if col not in desired_order]]

    # Display the updated master records
    display(master_records)  # Use display() for Jupyter notebook
else:
    print("Error: records_2 is not a DataFrame or is empty.")

Search (pathway* OR pipeline*) AND (educationlevel:"Preschool Education" OR educationlevel:"Kindergarten" OR educationlevel:"Grade 1" OR educationlevel:"Grade 2" OR educationlevel:"Grade 3" OR educationlevel:"Grade 4" OR educationlevel:"Grade 5" OR educationlevel:"Grade 6" OR educationlevel:"Grade 7" OR educationlevel:"Grade 8" OR educationlevel:"Grade 9" OR educationlevel:"Grade 10" OR educationlevel:"Grade 11" OR educationlevel:"Grade 12" OR educationlevel:"Junior High Schools" OR educationlevel:"Middle Schools" OR educationlevel:"High Schools" OR educationlevel:"Vocational High Schools" OR educationlevel:"Vocational Schools" OR educationlevel:"Elementary Education" OR educationlevel:"Elementary School Students" OR educationlevel:"Elementary Secondary Education" OR educationlevel:"Intermediate Grades" OR educationlevel:"Primary Education" OR educationlevel:"Secondary Education" OR educationlevel:"Adult Education" OR educationlevel:"Two Year Colleges" OR educationlevel:"Community Coll

Unnamed: 0,id,source_search,title,author,publicationdateyear,description,subject,source,publicationtype,peerreviewed,...,identifierstest,sponsor,iesgrantcontractnum,iesgrantcontractnumxlink,ieslinkdatasource,institution,iesfunded,ieswwcreviewed,ieslinkwwcreviewguide,isbn
959,EJ061632,Search 1,"Use of a ""Balance-Sheet"" Procedure to Improve ...","Mann, Leon",1972,This study tested the effectiveness of a tally...,"Academic Aspiration, College Bound Students, C...",Journal of Vocational Behavior,Journal Articles,T,...,,,,,,,,,,
1461,EJ199000,Search 1,Curriculum Tracking and Educational Stratifica...,"Alexander, Karl L., And Others",1978,This study examines (1) the mechanisms by whic...,"Academic Achievement, Age Grade Placement, Cur...",American Sociological Review,"Journal Articles, Reports - Research",T,...,,,,,,,,,,
1358,EJ198024,Search 1,Services for Learning Disabled Adolescents: A ...,"McNutt, Gaye, Heller, Ginger",1978,A survey of 301 randomly selected local educat...,"Delivery Systems, Educational Problems, Identi...",Learning Disability Quarterly,Journal Articles,T,...,,,,,,,,,,
908,EJ196224,Search 1,Toward an Acceptable Definition of Emotional D...,"Algozzine, Bob, And Others",1978,,"Behavior Patterns, Behavior Problems, Definiti...",Behavioral Disorders,Journal Articles,T,...,,,,,,,,,,
147,EJ202525,Search 1,Grouping Primary School Pupils for Instruction...,"Hogben, Donald",1978,Some of the problems and hazards associated wi...,"Ability Grouping, Classification, Elementary E...",Australian Journal of Education,"Journal Articles, Reports - Evaluative",T,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3091,EJ1425465,Search 1,School Choice Strategies at the Intersections ...,"Federico R. Waitoller, Christopher Lubienski",2024,While parents of students with disabilities (S...,"School Districts, Students with Disabilities, ...",Education Policy Analysis Archives,"Journal Articles, Reports - Research",T,...,,,,,,,,,,
1817,EJ1433714,Search 1,Separate School Placement for Students with Ex...,"Jessica A. Bowman, Yi-Chen Wu, Shawnee Wakeman...",2024,Separate school placements persist for student...,"Special Schools, Students with Disabilities, C...",Journal of Special Education,"Journal Articles, Reports - Research",T,...,,Office of Special Education Programs (OSEP) (E...,H326Y170004,"""",,,,,,
1964,EJ1435287,Search 1,Identifying Critical Factors When Predicting R...,"Thomas Mgonja, Francisco Robles",2024,Completion of remedial mathematics has been id...,"Predictor Variables, Remedial Mathematics, Mat...",Journal of College Student Retention: Research...,"Journal Articles, Reports - Research",T,...,,,,,,,,,,
1440,EJ1415053,Search 1,An Intervention to Support Students Placed bel...,"Jenny Cox, Scott Kaschner, Mary Krohn",2024,The number of undergraduates placing into deve...,"Undergraduate Students, College Faculty, Inter...",PRIMUS,"Journal Articles, Reports - Research, Tests/Qu...",T,...,,,,,,,,,,


### Search 3: Tracking and Placement
This search captures all uses of tracking and placement keywords when combined with specific education levels.
<ins>Search Parameters</ins>
<ul>
    <li><b>Key Words</b>: "tracking*" AND "placement*")
    <li><b>Education Level (educationlevel)</b>: "Preschool Education" OR "Kindergarten" OR "Grade 1" OR "Grade 2" OR "Grade 3" OR "Grade 4" OR "Grade 5" OR "Grade 6" OR "Grade 7" OR "Grade 8" OR "Grade 9" OR "Grade 10" OR "Grade 11" OR "Grade 12" OR "Junior High Schools" OR "Middle Schools" OR "High Schools" OR "Vocational High Schools" OR "Vocational Schools" OR "Elementary Education" OR "Elementary School Students" OR "Elementary Secondary Education" OR "Intermediate Grades" OR "Primary Education" OR "Secondary Education" OR "Adult Education" OR "Two Year Colleges" OR "Community Colleges" OR "Colleges" OR "Postsecondary Education" OR "Higher Education" OR "Graduate Study" OR "Early Childhood Education"</li>    
    <li><b>Standard Parameters</b>: publicationtype:"Journal Articles" AND peerreviewed:"T" AND -subject:"Foreign Countries"</li>
</ul>

In [8]:
# Updated third search query with multi-word terms combined with AND
search_3 = '(tracking* OR placement*) AND ' \
           '(educationlevel:"Preschool Education" OR educationlevel:"Kindergarten" OR educationlevel:"Grade 1" OR educationlevel:"Grade 2" OR educationlevel:"Grade 3" OR educationlevel:"Grade 4" OR educationlevel:"Grade 5" OR educationlevel:"Grade 6" OR educationlevel:"Grade 7" OR educationlevel:"Grade 8" OR educationlevel:"Grade 9" OR educationlevel:"Grade 10" OR educationlevel:"Grade 11" OR educationlevel:"Grade 12" OR educationlevel:"Junior High Schools" OR educationlevel:"Middle Schools" OR educationlevel:"High Schools" OR educationlevel:"Vocational High Schools" OR educationlevel:"Vocational Schools" OR educationlevel:"Elementary Education" OR educationlevel:"Elementary School Students" OR educationlevel:"Elementary Secondary Education" OR educationlevel:"Intermediate Grades" OR educationlevel:"Primary Education" OR educationlevel:"Secondary Education" OR educationlevel:"Adult Education" OR educationlevel:"Two Year Colleges" OR educationlevel:"Community Colleges" OR educationlevel:"Colleges" OR educationlevel:"Postsecondary Education" OR educationlevel:"Higher Education" OR educationlevel:"Graduate Study" OR educationlevel:"Early Childhood Education") AND ' \
           '(publicationtype:"Journal Articles" AND peerreviewed:T AND -subject:"Foreign Countries")'

records_3 = getAllEricRecords(search_3, allEricFields)  # Remove the debug argument

# Ensure records_3 is a DataFrame
if isinstance(records_3, pd.DataFrame) and not records_3.empty:
    # Add source_search column
    records_3['source_search'] = 'Search 3'
    
    # Check for duplicates between the new records and the master file
    duplicates = master_records['id'].isin(records_3['id'])

    # Print the number of duplicate records found
    print(f"Number of duplicate records found: {duplicates.sum()}")

    # Append new records to the master file and remove duplicates
    master_records = pd.concat([master_records, records_3]).drop_duplicates(subset='id')

    # Check if 'publicationdateyear' column exists and handle missing data
    if 'publicationdateyear' in master_records.columns:
        master_records['publicationdateyear'] = pd.to_numeric(master_records['publicationdateyear'], errors='coerce')
        master_records = master_records.sort_values(by='publicationdateyear')
    else:
        print("'publicationdateyear' column is missing in the records")

    # Reorder the columns in the DataFrame if all desired columns are present
    missing_columns = [col for col in desired_order if col not in master_records.columns]
    if missing_columns:
        print(f"Warning: The following columns are missing and will not be included: {missing_columns}")
        desired_order = [col for col in desired_order if col in master_records.columns]
    
    master_records = master_records[desired_order + [col for col in master_records.columns if col not in desired_order]]

    # Display the updated master records
    display(master_records)  # Use display() for Jupyter notebook
else:
    print("Error: records_3 is not a DataFrame or is empty.")

Search (tracking* OR placement*) AND (educationlevel:"Preschool Education" OR educationlevel:"Kindergarten" OR educationlevel:"Grade 1" OR educationlevel:"Grade 2" OR educationlevel:"Grade 3" OR educationlevel:"Grade 4" OR educationlevel:"Grade 5" OR educationlevel:"Grade 6" OR educationlevel:"Grade 7" OR educationlevel:"Grade 8" OR educationlevel:"Grade 9" OR educationlevel:"Grade 10" OR educationlevel:"Grade 11" OR educationlevel:"Grade 12" OR educationlevel:"Junior High Schools" OR educationlevel:"Middle Schools" OR educationlevel:"High Schools" OR educationlevel:"Vocational High Schools" OR educationlevel:"Vocational Schools" OR educationlevel:"Elementary Education" OR educationlevel:"Elementary School Students" OR educationlevel:"Elementary Secondary Education" OR educationlevel:"Intermediate Grades" OR educationlevel:"Primary Education" OR educationlevel:"Secondary Education" OR educationlevel:"Adult Education" OR educationlevel:"Two Year Colleges" OR educationlevel:"Community Co

Unnamed: 0,id,source_search,title,author,publicationdateyear,description,subject,source,publicationtype,peerreviewed,...,identifierstest,sponsor,iesgrantcontractnum,iesgrantcontractnumxlink,ieslinkdatasource,institution,iesfunded,ieswwcreviewed,ieslinkwwcreviewguide,isbn
959,EJ061632,Search 1,"Use of a ""Balance-Sheet"" Procedure to Improve ...","Mann, Leon",1972,This study tested the effectiveness of a tally...,"Academic Aspiration, College Bound Students, C...",Journal of Vocational Behavior,Journal Articles,T,...,,,,,,,,,,
1134,EJ203027,Search 1,A Procedure to Estimate the Probability of Err...,"Ackerson, Gary E., And Others",1978,Provides information concerning a procedure fo...,"Elementary Secondary Education, Probability, R...",Reading World,"Journal Articles, Guides - Non-Classroom",T,...,,,,,,,,,,
1176,EJ198756,Search 1,Response to Sokolow.,"Heslep, Robert D.",1978,Clarification is made of a stand taken on the ...,"Access to Education, Admission Criteria, Equal...",Educational Theory,Journal Articles,T,...,,,,,,,,,,
1706,EJ196380,Search 1,Fieldwork in Industrial Settings: Opportunitie...,"Akabas, Sheila H.",1978,The rationale for industrial social welfare is...,"Business, Experiential Learning, Field Experie...",Journal of Education for Social Work,Journal Articles,T,...,,,,,,,,,,
1990,EJ198868,Search 1,The Comparative Validity of the California Sta...,"Michael, William B., Shaffer, Phyllis",1978,The California State University and Colleges E...,"College Freshmen, English Education, Equivalen...",Educational and Psychological Measurement,"Journal Articles, Reports - Research",T,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3089,EJ1424806,Search 2,Teachers at the Speed of Light: Alternative Pa...,"Jo Lampert, Babak Dadvand",2024,What happens when Initial Teacher Education sh...,"Preservice Teacher Education, Preservice Teach...",Asia-Pacific Journal of Teacher Education,"Journal Articles, Reports - Evaluative",T,...,,,,,,,,,,
3086,EJ1424498,Search 2,Increasing Student Engagement &amp; Contributi...,"Priscilla Peña, Jen Riley, Nicole Davis",2024,Increasing student engagement is a challenge f...,"College Students, Marketing, Gamification, Com...",Marketing Education Review,"Journal Articles, Reports - Research",T,...,,,,,,,,,,
2989,EJ1418702,Search 2,&quot;Trump Can Take Away Your Status but He C...,"Michelle Rascón-Canales, Victoria Navarro Bena...",2024,This study focuses on the experiences of adult...,"Undocumented Immigrants, Adults, Federal Legis...",International Journal of Qualitative Studies i...,"Journal Articles, Reports - Research",T,...,,,,,,,,,,
2987,EJ1417153,Search 2,Content Coverage as a Persistent Exclusionary ...,"Brie Tripp, Sherri Cozzens, Catherine Hrycyk, ...",2024,STEM undergraduates navigate lengthy sequences...,"Nurses, Undergraduate Study, Attitudes, Nursin...",CBE - Life Sciences Education,"Journal Articles, Reports - Research",T,...,,,,,,,,,,


### Search 4: Stratification
This search captures all uses of stratification keywords when combined with specific education levels.
<ins>Search Parameters</ins>
<ul>
    <li><b>Key Words</b>: stratification*
    <li><b>Education Level (educationlevel)</b>: "Preschool Education" OR "Kindergarten" OR "Grade 1" OR "Grade 2" OR "Grade 3" OR "Grade 4" OR "Grade 5" OR "Grade 6" OR "Grade 7" OR "Grade 8" OR "Grade 9" OR "Grade 10" OR "Grade 11" OR "Grade 12" OR "Junior High Schools" OR "Middle Schools" OR "High Schools" OR "Vocational High Schools" OR "Vocational Schools" OR "Elementary Education" OR "Elementary School Students" OR "Elementary Secondary Education" OR "Intermediate Grades" OR "Primary Education" OR "Secondary Education" OR "Adult Education" OR "Two Year Colleges" OR "Community Colleges" OR "Colleges" OR "Postsecondary Education" OR "Higher Education" OR "Graduate Study" OR "Early Childhood Education"</li>    
    <li><b>Standard Parameters</b>: publicationtype:"Journal Articles" AND peerreviewed:"T" AND -subject:"Foreign Countries"</li>
</ul>

In [9]:
# Updated fourth search query with new parameters
search_4 = '((horizontal AND stratification*) OR (vertical AND stratification)) AND' \
           '(educationlevel:"Preschool Education" OR educationlevel:"Kindergarten" OR educationlevel:"Grade 1" OR educationlevel:"Grade 2" OR educationlevel:"Grade 3" OR educationlevel:"Grade 4" OR educationlevel:"Grade 5" OR educationlevel:"Grade 6" OR educationlevel:"Grade 7" OR educationlevel:"Grade 8" OR educationlevel:"Grade 9" OR educationlevel:"Grade 10" OR educationlevel:"Grade 11" OR educationlevel:"Grade 12" OR educationlevel:"Junior High Schools" OR educationlevel:"Middle Schools" OR educationlevel:"High Schools" OR educationlevel:"Vocational High Schools" OR educationlevel:"Vocational Schools" OR educationlevel:"Elementary Education" OR educationlevel:"Elementary School Students" OR educationlevel:"Elementary Secondary Education" OR educationlevel:"Intermediate Grades" OR educationlevel:"Primary Education" OR educationlevel:"Secondary Education" OR educationlevel:"Adult Education" OR educationlevel:"Two Year Colleges" OR educationlevel:"Community Colleges" OR educationlevel:"Colleges" OR educationlevel:"Postsecondary Education" OR educationlevel:"Higher Education" OR educationlevel:"Graduate Study" OR educationlevel:"Early Childhood Education") AND ' \
           '(publicationtype:"Journal Articles" AND peerreviewed:T AND -subject:"Foreign Countries")'

records_4 = getAllEricRecords(search_4, allEricFields) 

# Ensure records_4 is a DataFrame
if isinstance(records_4, pd.DataFrame) and not records_4.empty:
    # Add source_search column
    records_4['source_search'] = 'Search 4'
    
    # Check for duplicates between the new records and the master file
    duplicates = master_records['id'].isin(records_4['id'])

    # Print the number of duplicate records found
    print(f"Number of duplicate records found: {duplicates.sum()}")

    # Append new records to the master file and remove duplicates
    master_records = pd.concat([master_records, records_4]).drop_duplicates(subset='id')

    # Check if 'publicationdateyear' column exists and handle missing data
    if 'publicationdateyear' in master_records.columns:
        master_records['publicationdateyear'] = pd.to_numeric(master_records['publicationdateyear'], errors='coerce')
        master_records = master_records.sort_values(by='publicationdateyear')
    else:
        print("'publicationdateyear' column is missing in the records")

    # Reorder the columns in the DataFrame if all desired columns are present
    missing_columns = [col for col in desired_order if col not in master_records.columns]
    if missing_columns:
        print(f"Warning: The following columns are missing and will not be included: {missing_columns}")
        desired_order = [col for col in desired_order if col in master_records.columns]
    
    master_records = master_records[desired_order + [col for col in master_records.columns if col not in desired_order]]

    # Display the updated master records
    display(master_records)  # Use display() for Jupyter notebook
else:
    print("Error: records_4 is not a DataFrame or is empty.")

Search ((horizontal AND stratification*) OR (vertical AND stratification)) AND(educationlevel:"Preschool Education" OR educationlevel:"Kindergarten" OR educationlevel:"Grade 1" OR educationlevel:"Grade 2" OR educationlevel:"Grade 3" OR educationlevel:"Grade 4" OR educationlevel:"Grade 5" OR educationlevel:"Grade 6" OR educationlevel:"Grade 7" OR educationlevel:"Grade 8" OR educationlevel:"Grade 9" OR educationlevel:"Grade 10" OR educationlevel:"Grade 11" OR educationlevel:"Grade 12" OR educationlevel:"Junior High Schools" OR educationlevel:"Middle Schools" OR educationlevel:"High Schools" OR educationlevel:"Vocational High Schools" OR educationlevel:"Vocational Schools" OR educationlevel:"Elementary Education" OR educationlevel:"Elementary School Students" OR educationlevel:"Elementary Secondary Education" OR educationlevel:"Intermediate Grades" OR educationlevel:"Primary Education" OR educationlevel:"Secondary Education" OR educationlevel:"Adult Education" OR educationlevel:"Two Year 

Unnamed: 0,id,source_search,title,author,publicationdateyear,description,subject,source,publicationtype,peerreviewed,...,identifierstest,sponsor,iesgrantcontractnum,iesgrantcontractnumxlink,ieslinkdatasource,institution,iesfunded,ieswwcreviewed,ieslinkwwcreviewguide,isbn
959,EJ061632,Search 1,"Use of a ""Balance-Sheet"" Procedure to Improve ...","Mann, Leon",1972,This study tested the effectiveness of a tally...,"Academic Aspiration, College Bound Students, C...",Journal of Vocational Behavior,Journal Articles,T,...,,,,,,,,,,
1877,EJ201539,Search 1,Entwicklung eines Einstufungstests fuer Deutsc...,"Kummer, Manfred, And Others",1978,"Discusses various test types, and specifically...","College Students, Foreign Students, German, Gr...",Zielsprache Deutsch,"Journal Articles, Guides - Classroom - Teacher",T,...,,,,,,,,,,
1461,EJ199000,Search 1,Curriculum Tracking and Educational Stratifica...,"Alexander, Karl L., And Others",1978,This study examines (1) the mechanisms by whic...,"Academic Achievement, Age Grade Placement, Cur...",American Sociological Review,"Journal Articles, Reports - Research",T,...,,,,,,,,,,
1358,EJ198024,Search 1,Services for Learning Disabled Adolescents: A ...,"McNutt, Gaye, Heller, Ginger",1978,A survey of 301 randomly selected local educat...,"Delivery Systems, Educational Problems, Identi...",Learning Disability Quarterly,Journal Articles,T,...,,,,,,,,,,
147,EJ202525,Search 1,Grouping Primary School Pupils for Instruction...,"Hogben, Donald",1978,Some of the problems and hazards associated wi...,"Ability Grouping, Classification, Elementary E...",Australian Journal of Education,"Journal Articles, Reports - Evaluative",T,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2855,EJ1418150,Search 1,The Work Integrated Learning Experience of a U...,"Fiona Rillotta, Lorraine Lindsay, Cassandra Gi...",2024,Inclusive higher education programs support pe...,"College Students, Students with Disabilities, ...",International Journal of Inclusive Education,"Journal Articles, Reports - Research",T,...,,,,,,,,,,
1810,EJ1436885,Search 2,Getting Black Men to the Blackboard: Factors t...,"Sarah Manchanda, Travis Bristol, Phelton Moss",2024,Despite existing recruitment and retention eff...,"African American Teachers, Males, Career Choic...",Equity & Excellence in Education,"Journal Articles, Reports - Research",T,...,,,,,,,,,,
1809,EJ1435727,Search 2,Pathways to School Improvement: Discovering Ne...,"Miguel M. Gonzales, Tiber Garza, Elizabeth Leo...",2024,The purpose of this study is to examine the ne...,"Educational Improvement, Principals, Administr...","Journal of Educational Leadership, Policy and ...","Journal Articles, Reports - Research",T,...,,,,,,,,,,
3054,EJ1404452,Search 2,Assessing the Role of Spatial Inequality in Tr...,"Rachel E. Worsham, Melissa Whatley, Andrew Cra...",2024,Objective: Vertical community college transfer...,"Bachelors Degrees, College Transfer Students, ...",Community College Review,"Journal Articles, Reports - Research",T,...,,,,,,,,,,


### Save ERIC Search Results

In [1]:
### Save ERIC Search Results (Phase 2b)
#print(f"Results saved on 8/2/2024 for analysis.")

# Count the number of articles for each unique value in the 'source_search' column
#articles_per_search = master_records['source_search'].value_counts()

# Print the number of articles for each search in 'source_search'
#sorted_searches = articles_per_search.loc[['Search 1', 'Search 2', 'Search 3', 'Search 4']]
#print("\nNumber of Articles in Each Search (source_search):")
#for search, count in sorted_searches.items():
#    print(f"{search}: {count:,} articles")

#base_path = r"ERIC Searches\\"
#current_time = datetime.now().strftime("%Y%m%d_%H%M%S")
#filename = f"Tracking_API_Output_Phase2b_Search{current_time}.csv"
#full_path = f"{base_path}{filename}"

# Export the master_records DataFrame to a CSV file
#master_records.to_csv(full_path, index=False)
#print(f"ERIC search results data exported successfully to {full_path}")

#file_path = r"C:\Users\<user>\OneDrive - University of Georgia\Shared Folders\P-20 Parallels and Perils\Data Collection\Phase 2 - Literature Review Search\ERIC API Resources\ERIC Searches\Phase 2 Searches\Tracking_API_Output_Phase_2b_Search 1.csv"