# Data Preparation

The data needs a lot of work before I can fine-tune models on it. First, we read the xls. The file is on s3.

In [1]:
# enable AWS functionalities
!pip install boto3
!pip install s3fs

Collecting botocore<1.35.0,>=1.34.101 (from boto3)
  Using cached botocore-1.34.101-py3-none-any.whl (12.2 MB)
Installing collected packages: botocore
  Attempting uninstall: botocore
    Found existing installation: botocore 1.34.69
    Uninstalling botocore-1.34.69:
      Successfully uninstalled botocore-1.34.69
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
aiobotocore 2.12.3 requires botocore<1.34.70,>=1.34.41, but you have botocore 1.34.101 which is incompatible.[0m[31m
[0mSuccessfully installed botocore-1.34.101
Collecting botocore<1.34.70,>=1.34.41 (from aiobotocore<3.0.0,>=2.5.4->s3fs)
  Using cached botocore-1.34.69-py3-none-any.whl (12.0 MB)
Installing collected packages: botocore
  Attempting uninstall: botocore
    Found existing installation: botocore 1.34.101
    Uninstalling botocore-1.34.101:
      Successfully uninstalled botocore-1.3

In [8]:
import pandas as pd
df = pd.read_excel("s3://698modeldata/training_set_rel3.xls")

df.shape # confirming it's in

(12978, 28)

## Feature Extraction

I want to add features of the essay so that the model can (hopefully) perform better. First, the straightforward character and word count:

In [9]:
df["character_count"] = df["essay"].apply(len)
df["word_count"] = df["essay"].apply(lambda x: len(x.split()))

Now I add a column with a count of grammar and style errors

In [11]:
!pip install language_tool_python

Collecting language_tool_python
  Downloading language_tool_python-2.8-py3-none-any.whl (35 kB)
Installing collected packages: language_tool_python
Successfully installed language_tool_python-2.8


In [12]:
import numpy as np
import language_tool_python
from tqdm import tqdm

tool = language_tool_python.LanguageTool('en-US')

def count_errors_batch(essays):
    return [len(tool.check(essay)) for essay in essays]

# Doing it in batches to make it more managable
batch_size = 100

# number of batches needed (number of essays divided by batch size and rounded up)
n_batches = int(np.ceil(len(df) / batch_size))

error_counts = []
for i in tqdm(range(n_batches)):
    batch_start = i * batch_size
    batch_end = (i + 1) * batch_size
    batch = df["essay"][batch_start:batch_end]
    error_counts.extend(count_errors_batch(batch))

df["error_count"] = error_counts

Downloading LanguageTool 6.4: 100%|██████████| 246M/246M [00:07<00:00, 31.7MB/s]
INFO:language_tool_python.download_lt:Unzipping /tmp/tmp7w8jscax.zip to /root/.cache/language_tool_python.
INFO:language_tool_python.download_lt:Downloaded https://www.languagetool.org/download/LanguageTool-6.4.zip to /root/.cache/language_tool_python.
100%|██████████| 130/130 [39:31<00:00, 18.24s/it]


Let's quickly look at an example of the type of errors that might get returned:

In [13]:
tool = language_tool_python.LanguageTool('en-US')

essay_text = """Dear, Professor Miller, Dr. Smith, and Councilor Johnson, More and more people use computers, but not everyone agrees that this benefits society. Those who support advances in technology believe that computers have a positive effect on people. Others have different ideas. A great amount in the world today are using computers, some for work and some for the fun of it. Computers are one of man's greatest accomplishments. Computers are helpful in so many ways, including education, news, and live streams. Don't get me wrong, too many people spend time on the computer and they should be out interacting with others, but who are we to tell them what to do. When I grow up, I want to be an author or a journalist, and I know for a fact that both of those jobs involve lots of time on the computer, one more so than the other, but you know exactly what I'm getting at. So what if some experts think people are spending too much time on the computer and not exercising, enjoying nature, and interacting with family and friends. For all the expert knows, that's how most people make a living, and we don't know why people choose to use the computer for a great amount of time, and to be honest, it's none of my concern and it shouldn't be the so-called experts' concern. People interact a thousand times a day on the computers. Computers keep a lot of kids off the streets instead of being out and causing trouble. Computers help the FBI locate most wanted criminals. As you can see, computers are more useful to society than you think, computers benefit society."""

matches = tool.check(essay_text)

print(f"Number of issues found: {len(matches)}")
for match in matches:
    print(match)


Number of issues found: 2
Offset 569, length 4, Rule ID: COMMA_COMPOUND_SENTENCE
Message: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Suggestion: , and
...o many people spend time on the computer and they should be out interacting with oth...
                                           ^^^^
Offset 1215, length 4, Rule ID: COMMA_COMPOUND_SENTENCE
Message: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Suggestion: , and
...nd to be honest, it's none of my concern and it shouldn't be the so-called experts' ...
                                           ^^^^


One major concern I have is that essays are penalized for having placeholders (e.g. "@ORGANIZATION1"). I'll check:

In [14]:
import language_tool_python
tool = language_tool_python.LanguageTool("en-US")

def print_essay_errors(essay_text, essay_index):
    matches = tool.check(essay_text)
    print(f"Errors in Essay {essay_index}: {len(matches)} found")
    for match in matches:
        print(f"- {match.ruleId}: {match.message}")
        print(f"  Context: {match.context}")
    print("\n")

for index, row in df.head(10).iterrows():
    print_essay_errors(row["essay"], index)


Errors in Essay 0: 17 found
- SPACE_BEFORE_PARENTHESIS: It appears that a white space is missing.
  Context: ...w people, helps us learn about the globe(astronomy) and keeps us out of troble! T...
- MORFOLOGIK_RULE_EN_US: Possible spelling mistake found.
  Context: ...he globe(astronomy) and keeps us out of troble! Thing about! Dont you think so? How wo...
- EN_CONTRACTION_SPELLING: Possible spelling mistake found.
  Context: ...nd keeps us out of troble! Thing about! Dont you think so? How would you feel if you...
- EVERY_EVER: Did you mean “every”?
  Context: ...lways on the phone with friends! Do you ever time to chat with your friends or buisn...
- MORFOLOGIK_RULE_EN_US: Possible spelling mistake found.
  Context: ... ever time to chat with your friends or buisness partner about things. Well now - there'...
- QUESTION_MARK: If this is a question, use a question mark.
  Context: ...friends or buisness partner about things. Well now - there's a new way to chat th...
- MORFOLOGIK_RULE

And indeed, the essays aren't penalized for their placeholders. Let's add an error/word ratio, since a very long essay should be less penalized than a very short essay with the same number of errors:

In [15]:
df["error_to_word_ratio"] = df["error_count"] / df["word_count"]

Now let's add a readability level. That's fairly easily done, however, we'd also like a difference between the grade level of the student and the readibility level of the essay. So we also need to add a column with the grade level. We actually know the grade levels associateed with the different essay sets:

1. 8
2. 10
3. 10
4. 10
5. 8
6. 10
7. 7
8. 10

In [16]:
!pip install textstat

Collecting textstat
  Downloading textstat-0.7.3-py3-none-any.whl (105 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/105.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.1/105.1 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pyphen (from textstat)
  Downloading pyphen-0.15.0-py3-none-any.whl (2.1 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.1 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.1/2.1 MB[0m [31m127.7 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m45.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pyphen, textstat
Successfully installed pyphen-0.15.0 textstat-0.7.3


In [17]:
grade_level_map = {
    1: 8,
    2: 10,
    3: 10,
    4: 10,
    5: 8,
    6: 10,
    7: 7,
    8: 10
}

df["grade_level"] = df["essay_set"].map(grade_level_map)

import textstat
df["dale_chall_score"] = df["essay"].apply(textstat.dale_chall_readability_score)
df["complexity_difference"] = df["dale_chall_score"] - df["grade_level"]

Now, we want to calculate the different parts of speech present in essays. This could be quite informative; we might, for example, expect strong narrative essays to have more adjectives (all else being equal). We might also expect strong essays to have a good balance of parts of speech. So this is potentially quite informative. Again, though, we are really most interested in the proportions of parts of speech, and so that's what we'll calculate.

Note: the full list of tags can be found here: https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html


In [18]:
!pip install nltk



In [19]:
import nltk
nltk.download("averaged_perceptron_tagger")
nltk.download("punkt")
from nltk.tokenize import word_tokenize
from nltk import pos_tag
from collections import Counter

def count_pos_proportions(essay_text):
    tokens = word_tokenize(essay_text)
    tagged_tokens = pos_tag(tokens)
    total_tokens = len(tokens)
    pos_counts = Counter(tag for word, tag in tagged_tokens)
    pos_proportions = {tag: count / total_tokens for tag, count in pos_counts.items()}
    return pos_proportions

df["pos_proportions"] = df["essay"].apply(count_pos_proportions)

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Next we look at the dependency relations present in essays, again with a focus on proportions. We might expect stronger essays to have a broader range of dependency structures.

In [20]:
!pip install spacy



In [21]:
import spacy

nlp = spacy.load("en_core_web_sm") # load small English model

def normalize_dep_relations(text):
    essay = nlp(text)
    dep_counts = Counter(token.dep_ for token in essay) # iterate over each token, extract label, and count
    total_deps = sum(dep_counts.values())
    dep_proportions = {dep: count / total_deps for dep, count in dep_counts.items()} #normalizing
    return dep_proportions

df["dependency_proportions"] = df["essay"].apply(normalize_dep_relations)

At this point, linguistic features have been extracted from the essays. Data preparation work remains, but it is not linguistic in nature. Additionally, it is specific to the model that the data will be used for. As such, we'll split the data, and continue with the data preparation.

## Data Preparation by Essay Category

First, the dataframe needs to be split based on the essay category. The first two essays are persuasive, the next four are source-dependent, and the final two essays are narrative.

In [22]:
df_persuasive = df[df['essay_set'].isin([1, 2])]
df_source_dependent = df[df['essay_set'].isin([3, 4, 5, 6])]
df_narrative = df[df['essay_set'].isin([7, 8])]

print(df_persuasive.shape[0] + df_source_dependent.shape[0] + df_narrative.shape[0] == df.shape[0]) # just to address my paranoia

True


It actually may not matter if there are missing values, since ultimately only some values will be concatenated for the model input. Still, these dataframes are pretty unwieldy, and it's well worth cleaning them up some.

In [23]:
# calculate percent of missing values per column
def check_missing_values(df):
    na_percent = df.isna().sum() / len(df) * 100
    na_percent = na_percent[na_percent > 0].sort_values(ascending=False)
    print(na_percent)

print("Persuasive Essays Missing Data:")
check_missing_values(df_persuasive)

print("\nSource-Dependent Essays Missing Data:")
check_missing_values(df_source_dependent)

print("\nNarrative Essays Missing Data:")
check_missing_values(df_narrative)

Persuasive Essays Missing Data:
rater3_domain1    100.000000
rater2_trait3     100.000000
rater3_trait5     100.000000
rater3_trait4     100.000000
rater3_trait3     100.000000
rater3_trait2     100.000000
rater3_trait1     100.000000
rater2_trait6     100.000000
rater2_trait5     100.000000
rater2_trait4     100.000000
rater2_trait2     100.000000
rater2_trait1     100.000000
rater1_trait6     100.000000
rater1_trait5     100.000000
rater1_trait4     100.000000
rater1_trait3     100.000000
rater1_trait2     100.000000
rater1_trait1     100.000000
rater3_trait6     100.000000
rater1_domain2     49.762769
domain2_score      49.762769
rater2_domain2     49.762769
dtype: float64

Source-Dependent Essays Missing Data:
rater1_trait6     100.000000
rater2_trait1     100.000000
rater3_trait5     100.000000
rater3_trait4     100.000000
rater3_trait3     100.000000
rater3_trait2     100.000000
rater3_trait1     100.000000
rater2_trait6     100.000000
rater2_trait5     100.000000
rater2_trait4  

Columns will be removed if they are missing values for *all* rows.

In [24]:
def drop_all_missing(df):
  missing_percentage = (df.isna().sum() / len(df)) * 100 # same calculation as earlier
  columns_to_drop = missing_percentage[missing_percentage == 100].index
  df = df.drop(columns=columns_to_drop)

  return df

persuasive = drop_all_missing(df_persuasive)
print(persuasive.shape)

source_dependent = drop_all_missing(df_source_dependent)
print(source_dependent.shape)

narrative = drop_all_missing(df_narrative)
print(df_narrative.shape)


(3583, 18)
(7103, 15)
(2292, 37)


Now each of these dataframes needs to be split into 3 (training, validation, and testing subsets). Ultimately this will be a 70-15-15 split, but it will start with a 70-30 split with the latter subset being split 50-50. Critically, the sets for each dataframe need to have the same proportion of each essay set; this can be achieved by stratifying.

In [27]:
from sklearn.model_selection import train_test_split

def perform_split(features_df):
    train, temp = train_test_split(
        features_df,
        test_size=0.3,
        random_state=7,
        stratify=features_df['essay_set']
    )

    val, test = train_test_split(
        temp,
        test_size=0.5, # 30% split into 2, achieving the 70-15-15 split
        random_state=7,
        stratify=temp['essay_set']
    )

    return train, val, test

# Applying to each
train_persuasive, val_persuasive, test_persuasive = perform_split(persuasive)
train_source_dependent, val_source_dependent, test_source_dependent = perform_split(source_dependent)
train_narrative, val_narrative, test_narrative = perform_split(narrative)




Some features will now be scaled. Specifically: 'character_count', 'word_count', 'error_count', and 'error_to_word_ratio'. There are, of course, other numeric features, but scaling grade level and Dale-Chall score could be misleading. These features in the validation and test sets are transformed using the parameters learned when scaling the training data. This avoids data leakage.

In [28]:
from sklearn.preprocessing import StandardScaler

def scale_features(train_df, val_df, test_df, numeric_features):
    scaler = StandardScaler()
    for feature in numeric_features:
        train_df[feature + '_scaled'] = scaler.fit_transform(train_df[[feature]])
        val_df[feature + '_scaled'] = scaler.transform(val_df[[feature]])
        test_df[feature + '_scaled'] = scaler.transform(test_df[[feature]])

        # Remove original columns
        train_df.drop(columns=[feature], inplace=True)
        val_df.drop(columns=[feature], inplace=True)
        test_df.drop(columns=[feature], inplace=True)

    return train_df, val_df, test_df

numeric_features = ['character_count', 'word_count', 'error_count', 'error_to_word_ratio']

# Calling the function for all sets
train_persuasive, val_persuasive, test_persuasive = scale_features(
    train_persuasive, val_persuasive, test_persuasive,
    numeric_features
)

train_source_dependent, val_source_dependent, test_source_dependent = scale_features(
    train_source_dependent, val_source_dependent, test_source_dependent,
    numeric_features
)

train_narrative, val_narrative, test_narrative = scale_features(
    train_narrative, val_narrative, test_narrative,
    numeric_features
)


At this point, it's worth reviewing the nature of the different essays. This is important because different essays have different criteria, scoring logics, and more. This text is not critical for understanding the code ahead, and can be skipped without missing too much information.

Note: the texts in prompts 4, 6, and 7 were summarized using ChatGPT in order to have shorter prompts in the model input.

**Essay 1**

*Prompt 1:*

More and more people use computers, but not everyone agrees that this benefits society. Those who support advances in technology believe that computers have a positive effect on people. They teach hand-eye coordination, give people the ability to learn about faraway places and people, and even allow people to talk online with other people. Others have different ideas. Some experts are concerned that people are spending too much time on their computers and less time exercising, enjoying nature, and interacting with family and friends.

Write a letter to your local newspaper in which you state your opinion on the effects computers have on people. Persuade the readers to agree with you.

*Criteria 1:*

1: An undeveloped response that may take a position but offers no more than very minimal support. Typical elements: Contains few or vague details. Is awkward and fragmented. May be difficult to read and understand. May show no awareness of audience.

2: An under-developed response that may or may not take a position. Typical elements: Contains only general reasons with unelaborated and/or list-like details. Shows little or no evidence of organization. May be awkward and confused or simplistic. May show little awareness of audience.

3: A minimally-developed response that may take a position, but with inadequate support and details. Typical elements: Has reasons with minimal elaboration and more general than specific details. Shows some organization. May be awkward in parts with few transitions. Shows some awareness of audience.

4: A somewhat-developed response that takes a position and provides adequate support. Typical elements: Has adequately elaborated reasons with a mix of general and specific details. Shows satisfactory organization. May be somewhat fluent with some transitional language. Shows adequate awareness of audience.

5: A developed response that takes a clear position and provides reasonably persuasive support. Typical elements: Has moderately well elaborated reasons with mostly specific details. Exhibits generally strong organization. May be moderately fluent with transitional language throughout. May show a consistent awareness of audience.

6: A well-developed response that takes a clear and thoughtful position and provides persuasive support. Typical elements: Has fully elaborated reasons with specific details. Exhibits strong organization. Is fluent and uses sophisticated transitional language. May show a heightened awareness of audience.

*Scoring Logic*

resolved_score = rater1_domain1 + rater2_domain1

*Range*

2-12

**Essay 2**

Prompt 2:

"All of us can think of a book that we hope none of our children or any other children have taken off the shelf. But if I have the right to remove that book from the shelf -- that work I abhor -- then you also have exactly the same right and so does everyone else. And then we have no books left on the shelf for any of us." --Katherine Paterson, Author


Write a persuasive essay to a newspaper reflecting your vies on censorship in libraries. Do you believe that certain materials, such as books, music, movies, magazines, etc., should be removed from the shelves if they are found offensive? Support your position with convincing arguments from your own experience, observations, and/or reading.

Criteria 2:

*Domain 1: Writing Applications*

Ideas and Content : Does the writing sample fully accomplish the task (e.g., support an opinion, summarize, tell a story, or write an article)? Does the writing sample include thorough, relevant, and complete ideas?

Organization: Are the ideas in the writing sample organized logically?

Style: Does the writing sample exhibit exceptional word usage? Does the writing sample demonstrate exceptional writing technique?

Voice: Does the writing sample demonstrate effective adjustment of language and tone to task and reader?

6: A Score Point 6 paper is rare. It fully accomplishes the task in a thorough and insightful manner and has a distinctive quality that sets it apart as an outstanding performance.

5: A Score Point 5 paper represents a solid performance. It fully accomplishes the task, but lacks the overall level of sophistication and consistency of a Score Point 6 paper.

4: A Score Point 4 paper represents a good performance. It accomplishes the task, but generally needs to exhibit more development, better organization, or a more sophisticated writing style to receive a higher score.

3: A Score Point 3 paper represents a performance that minimally accomplishes the task. Some elements of development, organization, and writing style are weak.

2: A Score Point 2 paper represents a performance that only partially accomplishes the task. Some responses may exhibit difficulty maintaining a focus. Others may be too brief to provide sufficient development of the topic or evidence of adequate organizational or writing style.

1: A Score Point 1 paper represents a performance that fails to accomplish the task. It exhibits considerable difficulty in areas of development, organization, and writing style. The writing is generally either very brief or rambling and repetitive, sometimes resulting in a response that may be difficult to read or comprehend.

*Domain 2: Language Conventions*

4: Does the writing sample exhibit a superior command of language skills? A Score Point 4 paper exhibits a superior command of written English language conventions. The paper provides evidence that the student has a thorough control of the concepts outlined in the Indiana Academic Standards associated with the student’s grade level. In a Score Point 4 paper, there are no errors that impair the flow of communication. Errors are generally of the first-draft variety or occur when the student attempts sophisticated sentence construction.

3: Does the writing sample exhibit a good control of language skills? In a Score Point 3 paper, errors are occasional and are often of the first-draft variety; they have a minor impact on the flow of communication.

2: Does the writing sample exhibit a fair control of language skills? In a Score Point 2 paper, errors are typically frequent and may occasionally impede the flow of communication.

1: Does the writing sample exhibit a minimal or less than minimal control of language skills? In a Score Point 1 paper, errors are serious and numerous. The reader may need to stop and reread part of the sample and may struggle to discern the writer’s meaning.

*Scoring Logic*

There are two resolved scores for this essay set: domain1_score and domain2_score

resolved_score2d1 resolved_score2d2

*Range*

d1: 1-6 d

2: 1-4

**Essay 3**



Prompt 3:

Read the story "Rough Road Ahead" by Joe Kurmaskie.

``The story follows Joe Kurmaskie, a solo cyclist, who finds himself in a challenging situation due to misguided advice from a group of elderly locals he encountered at a campground near Lodi, California. The old men recommend a "shortcut" to Yosemite National Park, which Joe decides to take the following morning. This route initially seems promising but quickly leads him into difficulty.

Joe's journey takes him to a ghost town, where he begins to doubt the accuracy of the old men's directions. As he continues, the terrain becomes more demanding, and he struggles with heat and inadequate water supplies. He reaches a dilapidated water pump in a deserted area, only to find that the water it produces is nearly undrinkable.

Pushing forward, Joe encounters increasingly rough roads and experiences severe dehydration, which is exacerbated by the intense California heat. His situation becomes dire as he realizes the supposed nearby town is much farther away than he had been led to believe.

In a twist of irony, Joe comes across an abandoned building, which turns out to be a former Welch’s Grape Juice factory—a mocking reminder of his thirst. Driven by desperation, he sucks on pebbles to stimulate saliva production and mitigate his thirst.

Eventually, Joe reaches a fish camp where he finally finds relief and water. The story concludes with Joe reflecting on his ordeal and resolving to rely solely on his own map in the future, rather than trusting dubious advice.``

Write a response that explains how the features of the setting affect the cyclist. In your response, include examples from the essay that support your conclusion.

*Criteria 3:*

3: The response demonstrates an understanding of the complexities of the text. Addresses the demands of the question Uses expressed and implied information from the text Clarifies and extends understanding beyond the literal

2: The response demonstrates a partial or literal understanding of the text. Addresses the demands of the question, although may not develop all parts equally Uses some expressed or implied information from the text to demonstrate understanding May not fully connect the support to a conclusion or assertion made about the text(s)

1: The response shows evidence of a minimal understanding of the text. May show evidence that some meaning has been derived from the text May indicate a misreading of the text or the question May lack information or explanation to support an understanding of the text in relation to the question

0: The response is completely irrelevant or incorrect, or there is no response.

*Scoring Logic 3*

Because we don't have a third rater, we will calculate the resolved score as the maximum of the two raters' scores if they are adjacent or equal; else, average and round the two scores (typically just the average).


*Range*

0-3

**Essay 4**

*Prompt 4:*

Read the excerpt of "The Winter Hibiscus" by Minfong Ho

`` Saeng, a teenage girl, and her family have moved to the United States from Vietnam. As Saeng walks home after failing her driver's test, she sees a familiar plant. Later, shegoes to a florist shop to see if the plant can be purchased.

Saeng, a character deeply connected to her cultural roots, experiences a poignant moment of nostalgia and loss during a visit to a greenhouse. Surrounded by familiar plants from her childhood, she is particularly drawn to a hibiscus, reminiscent of the vibrant flowers from her youth in her home country. This single blossom evokes vivid memories of her family's garden, the rituals she performed there, and the natural beauty that once surrounded her.

The narrative weaves through her emotional journey as she encounters other familiar plants, each stirring memories of home and her past life, especially the jasmine plant, which brings back specific memories of her grandmother. The sensory details of the plants' sights and smells trigger a cascade of emotions, leading her to purchase a hibiscus plant, despite its high cost, as a way to hold onto her past.

At home, her interaction with her mother reveals another layer of her struggle: she has failed an important test, adding to her sense of loss. However, the story ends on a note of hopeful resilience. Saeng plants the hibiscus in her new environment, symbolizing her attempt to root herself in new soil while preserving connections to her heritage. She resolves to retake the test, inspired by the cyclic return of the geese, signaling renewal and the continuation of life's cycles. ``

Read the last paragraph of the story.

"When they come back, Saeng vowed silently to herself, in the spring, when the snows melt and the geese return and this hibiscus is budding, then I will take that test again."

Write a response that explains why the author concludes the story with this paragraph. In your response, include details and examples from the story that support your ideas.

*Criteria 4:*

3: The response demonstrates an understanding of the complexities of the text. Addresses the demands of the question Uses expressed and implied information from the text Clarifies and extends understanding beyond the literal

2: The response demonstrates a partial or literal understanding of the text. Addresses the demands of the question, although may not develop all parts equally Uses some expressed or implied information from the text to demonstrate understanding May not fully connect the support to a conclusion or assertion made about the text(s)

1: The response shows evidence of a minimal understanding of the text. May show evidence that some meaning has been derived from the text May indicate a misreading of the text or the question May lack information or explanation to support an understanding of the text in relation to the question

0: The response is completely irrelevant or incorrect, or there is no response.

*Scoring Logic 4*

Again, because we don't have a third rater, we will calculate the resolved score as the maximum of the two raters' scores if they are adjacent or equal; else, average and round the two scores (typically just the average).

*Range*

0-3

**Essay 5**

*Prompt 5:*

Read "Home: The Blueprints of Our Lives" by Narciso Rodriguez.

My parents, originally from Cuba, arrived in the United States in 1956. After living for a year in a furnished one-room apartment, twenty-one-year-old Rawedia Maria and twenty-seven-year-old Narciso Rodriguez, Sr., could afford to move into a modest, three-room apartment I would soon call home. In 1961, I was born into this simple house, situated in a two-family, blond-brick building in the Ironbound section of Newark, New Jersey. Within its walls, my young parents created our traditional Cuban home, the very heart of which was the kitchen. My parents both shared cooking duties and unwittingly passed on to me their rich culinary skills and a love of cooking that is still with me today (and for which I am eternally grateful). Passionate Cuban music (which I adore to this day) filled the air, mixing with the aromas of the kitchen. Here, the innocence of childhood, the congregation of family and friends, and endless celebrations that encompassed both, formed the backdrop to life in our warm home. Growing up in this environment instilled in me a great sense that “family” had nothing to do with being a blood relative. Quite the contrary, our neighborhood was made up of mostly Spanish, Cuban, and Italian immigrants at a time when overt racism was the norm and segregation prevailed in the United States. In our neighborhood, despite customs elsewhere, all of these cultures came together in great solidarity and friendship. It was a close-knit community of honest, hardworking immigrants who extended a hand to people who, while not necessarily their own kind, were clearly in need. Our landlord and his daughter, Alegria (my babysitter and first friend), lived above us, and Alegria graced our kitchen table for meals more often than not. Also at the table were Sergio and Edelmira, my surrogate grandparents who lived in the basement apartment. (I would not know my “real” grandparents, Narciso the Elder and Consuelo, until 1970 when they were allowed to leave Cuba.) My aunts Bertha and Juanita and my cousins Arnold, Maria, and Rosemary also all lived nearby and regularly joined us at our table. Countless extended family members came and went — and there was often someone staying with us temporarily until they were able to get back on their feet. My parents always kept their arms and their door open to the many people we considered family, knowing that they would do the same for us.

Describe the mood created by the author in the memoir. Support your answer with relevant and specific information from the memoir.


*Criteria 5:*

4: The response is a clear, complete, and accurate description of the mood created by the author. The response includes relevant and specific information from the memoir.

3: The response is a mostly clear, complete, and accurate description of the mood created by the author. The response includes relevant but often general information from the memoir.

2: The response is a partial description of the mood created by the author. The response includes limited information from the memoir and may include misinterpretations.

1: The response is a minimal description of the mood created by the author. The response includes little or no information from the memoir and may include misinterpretations. OR The response relates minimally to the task.

0: The response is incorrect or irrelevant or contains insufficient information to demonstrate comprehension.

*Scoring Logic*

We need two scores

The resolved score is the maximum of the two raters' scores.

*Range*

0-4

**Essay 6**

*Prompt 6:*

`` “The Mooring Mast” discusses how The Empire State Building, originally envisioned to be the world's tallest building, was conceived during a time of intense architectural rivalry, particularly with the Chrysler Building, which was under construction at the same time. The Chrysler Building's architect added a secret 185-foot spire, temporarily claiming the title for the tallest structure. This move spurred the Empire State Building's planners, led by former New York Governor Al Smith, to push the design even further, ultimately setting its new height at 1,250 feet.

The increased height included an ambitious plan for a mooring mast at the top of the building, which was intended to serve as a docking station for dirigibles, or zeppelins. This feature was inspired by the growing interest in airship travel in the 1920s, which was seen as the future of transatlantic transportation. The dirigibles, large airships powered by engines and capable of carrying passengers in a gondola below, could theoretically dock at the Empire State Building, allowing passengers to embark and disembark right in the heart of Manhattan.

The idea was that the Empire State Building would not just be a static office building, but a dynamic transportation hub. It was to be equipped with facilities to handle airship passengers, including customs and ticketing areas on the 86th and 101st floors, effectively integrating the building into the emerging global travel network.

However, the practical realization of this vision faced numerous challenges. The architects and engineers had to consider how to securely anchor a thousand-foot-long dirigible without it posing a risk to the building and the city below. This required substantial modifications to the building's structure to handle the added stress of a moored airship, particularly in managing the forces exerted by winds at such a height.

Despite the meticulous planning and consultation with experts, including tours of naval airship operations, the idea faced insurmountable hurdles. The greatest obstacle was the inherent dangers posed by the flammable hydrogen gas used by most dirigibles (helium, a safer alternative, was in scarce supply). The tragic destruction of the Hindenburg in 1937 underscored the potential risks of docking airships in densely populated areas and ultimately sounded the death knell for the mooring mast concept.

Additionally, natural factors such as the unpredictable and often violent winds at the top of the building made mooring safely a near impossibility. Legal restrictions on low-flying aircraft over populated areas further complicated the situation. While there were attempts to use the mast—like the Goodyear blimp Columbia's stunt of delivering newspapers—the practical use of the mooring mast for airship docking was never realized.

By the late 1930s, with the rapid advancements in airplane technology, the age of the dirigible was coming to an end, rendering the mooring mast obsolete. The areas of the building designated for airship passengers were eventually repurposed for public use, including a high-altitude soda fountain and tea garden. The open observation deck, which was intended for airship passengers, has remained closed to the public.

In conclusion, the mooring mast of the Empire State Building stands as a fascinating example of ambitious architectural planning that failed to materialize. It reflects the optimism and forward-thinking of an era that envisioned a future of airships integrating with the urban landscape. While the mast was never used for its intended purpose, it remains an iconic part of the New York City skyline and a testament to the audacious dreams of its creators. The story of the Empire State Building’s mooring mast is a poignant reminder of the limits of contemporary technology and the unpredictability of progress. ``

Based on the excerpt, describe the obstacles the builders of the Empire State Building faced in attempting to allow dirigibles to dock there. Support your answer with relevant and specific information from the excerpt.

*Criteria 6:*

4: The response is a clear, complete, and accurate description of the obstacles the builders of the Empire State Building faced in attempting to allow dirigibles to dock there. The response includes relevant and specific information from the excerpt.

3: The response is a mostly clear, complete, and accurate description of the obstacles the builders of the Empire State Building faced in attempting to allow dirigibles to dock there. The response includes relevant but often general information from the excerpt.

2: The response is a partial description of the obstacles the builders of the Empire State Building faced in attempting to allow dirigibles to dock there. The response includes limited information from the excerpt and may include misinterpretations.

1: The response is a minimal description of the obstacles the builders of the Empire State Building faced in attempting to allow dirigibles to dock there. The response includes little or no information from the excerpt and may include misinterpretations. OR The response relates minimally to the task.

0: The response is totally incorrect or irrelevant, or contains insufficient evidence to demonstrate comprehension.

*Scoring Logic 6*

We need two scores.

The resolved score is the maximum of the two raters' scores

*Range*

0-4

**Essay 7**

*Prompt 7:*

Write about patience. Being patient means that you are understanding and tolerant. A patient person experience difficulties without complaining. Do only one of the following: write a story about a time when you were patient OR write a story about a time when someone you know was patient OR write a story in your own way about patience.

*Criteria 7:*

Ideas

3: Tells a story with ideas that are clearly focused on the topic and are thoroughly developed with specific, relevant details.

2: Tells a story with ideas that are somewhat focused on the topic and are developed with a mix of specific and/or general details.

1: Tells a story with ideas that are minimally focused on the topic and developed with limited and/or general details. 0: Ideas are not focused on the task and/or are undeveloped.

Organization

3: Organization and connections between ideas and/or events are clear and logically sequenced.

2: Organization and connections between ideas and/or events are logically sequenced.

1: Organization and connections between ideas and/or events are weak.

0: No organization evident.

Style

3: Command of language, including effective and compelling word choice and varied sentence structure, clearly supports the writer's purpose and audience.

2: Adequate command of language, including effective word choice and clear sentences, supports the writer's purpose and audience.

1: Limited use of language, including lack of variety in word choice and sentences, may hinder support for the writer's purpose and audience.

0: Ineffective use of language for the writer's purpose and audience.

Conventions

3: Consistent, appropriate use of conventions of Standard English for grammar, usage, spelling, capitalization, and punctuation for the grade level.

2: Adequate use of conventions of Standard English for grammar, usage, spelling, capitalization, and punctuation for the grade level.

1: Limited use of conventions of Standard English for grammar, usage, spelling, capitalization, and punctuation for the grade level.

0: Ineffective use of conventions of Standard English for grammar, usage, spelling, capitalization, and punctuation.

*Scoring Logic 7*

The resolved score is the sum of two raters' scores which are the sums of the raters' scores for each of the aforementioned traits.

*Range*

0-30

**Essay 8**

Prompt 8: We all understand the benefits of laughter. For example, someone once said, “Laughter is the shortest distance between two people.” Many other people believe that laughter is an important part of any relationship. Tell a true story in which laughter was one element or part.

*Criteria 8:*

Ideas and Content

6: The writing is exceptionally clear, focused, and interesting. It holds the reader’s attention throughout. Main ideas stand out and are developed by strong support and rich details suitable to audience and purpose.

5: The writing is clear, focused and interesting. It holds the reader’s attention. Main ideas stand out and are developed by supporting details suitable to audience and purpose.

4: The writing is clear and focused. The reader can easily understand the main ideas. Support is present, although it may be limited or rather general.

3: The reader can understand the main ideas, although they may be overly broad or simplistic, and the results may not be effective. Supporting detail is often limited, insubstantial, overly general, or occasionally slightly off-topic.

2: Main ideas and purpose are somewhat unclear or development is attempted but minimal.

1: The writing lacks a central idea or purpose.

Organization

6: The organization enhances the central idea(s) and its development. The order and structure are compelling and move the reader through the text easily.

5: The organization enhances the central idea(s) and its development. The order and structure are strong and move the reader through the text.

4: Organization is clear and coherent. Order and structure are present, but may seem formulaic.

3: An attempt has been made to organize the writing; however, the overall structure is inconsistent or skeletal.

2: The writing lacks a clear organizational structure. An occasional organizational device is discernible; however, the writing is either difficult to follow and the reader has to reread substantial portions, or the piece is simply too short to demonstrate organizational skills.

1: The writing lacks coherence; organization seems haphazard and disjointed. Even after rereading, the reader remains confused.

Sentence Fluency

6: The writing has an effective flow and rhythm. Sentences show a high degree of craftsmanship, with consistently strong and varied structure that makes expressive oral reading easy and enjoyable.

5: The writing has an easy flow and rhythm. Sentences are carefully crafted, with strong and varied structure that makes expressive oral reading easy and enjoyable.

4: The writing flows; however, connections between phrases or sentences may be less than fluid. Sentence patterns are somewhat varied, contributing to ease in oral reading.

3: The writing tends to be mechanical rather than fluid. Occasional awkward constructions may force the reader to slow down or reread.

2: The writing tends to be either choppy or rambling. Awkward constructions often force the reader to slow down or reread.

1: The writing is difficult to follow or to read aloud. Sentences tend to be incomplete, rambling, or very awkward.

Conventions

6: The writing demonstrates exceptionally strong control of standard writing conventions (e.g., punctuation, spelling, capitalization, grammar and usage) and uses them effectively to enhance communication. Errors are so few and so minor that the reader can easily skim right over them unless specifically searching for them.

5: The writing demonstrates strong control of standard writing conventions (e.g., punctuation, spelling, capitalization, grammar and usage) and uses them effectively to enhance communication. Errors are few and minor. Conventions support readability.

4: The writing demonstrates control of standard writing conventions (e.g., punctuation, spelling, capitalization, grammar and usage). Significant errors do not occur frequently. Minor errors, while perhaps noticeable, do not impede readability.

3: The writing demonstrates limited control of standard writing conventions (e.g., punctuation, spelling, capitalization, grammar and usage). Errors begin to impede readability.

2: The writing demonstrates little control of standard writing conventions. Frequent, significant errors impede readability.

1: Numerous errors in usage, spelling, capitalization, and punctuation repeatedly distract the reader and make the text difficult to read. In fact, the severity and frequency of errors are so overwhelming that the reader finds it difficult to focus on the message and must reread for meaning.

*Scoring Logic 8*

Again, because we don't have a straightforward way to incorporate the third rater, the scoring logic is simplified. The resolved score is the sum of both raters' scores. Each rater's score is the sum of their score for each trait (except for their score for conventions which is doubled).

*Range*

0-60

Given the information above, it becomes clear which subscores need to be predicted for each essay set so that the scoring logic for that essay can be used to calculate its resolved score.

As such, each dataframe will be expanded so that there are as many rows for each essay as target scores that will be calculated.

In [29]:
def expand_essays_to_scores_by_set(df, score_columns_dict):
    expanded_rows = []
    for idx, row in df.iterrows():
        score_columns = score_columns_dict.get(row['essay_set'], [])
        for score_column in score_columns:
            new_row = row.copy()
            new_row['target_score'] = new_row[score_column] # Also add a target row column with the type of score that will be predicted
            # Also, set all other scores to None except for the target score column
            for sc in score_columns:
                if sc != score_column:
                    new_row[sc] = None
            new_row['score_type'] = score_column
            expanded_rows.append(new_row)
    return pd.DataFrame(expanded_rows)


# Dictionary with scores for each essay set (see the information above if unclear):

score_columns_dict = {
    1: ['rater1_domain1', 'rater2_domain1'],
    2: ['domain1_score', 'domain2_score'],
    3: ['rater1_domain1', 'rater2_domain1'],
    4: ['rater1_domain1', 'rater2_domain1'],
    5: ['rater1_domain1', 'rater2_domain1'],
    6: ['rater1_domain1', 'rater2_domain1'],
    7: ['rater1_trait1', 'rater1_trait2', 'rater1_trait3', 'rater1_trait4', 'rater2_trait1', 'rater2_trait2', 'rater2_trait3', 'rater2_trait4'],
    8: ['rater1_trait1', 'rater1_trait2', 'rater1_trait5', 'rater1_trait6', 'rater2_trait1', 'rater2_trait2', 'rater2_trait5', 'rater2_trait6']
}


# Calling the function

train_persuasive_expanded = expand_essays_to_scores_by_set(train_persuasive, score_columns_dict)
val_persuasive_expanded = expand_essays_to_scores_by_set(val_persuasive, score_columns_dict)
test_persuasive_expanded = expand_essays_to_scores_by_set(test_persuasive, score_columns_dict)

train_source_dependent_expanded = expand_essays_to_scores_by_set(train_source_dependent, score_columns_dict)
val_source_dependent_expanded = expand_essays_to_scores_by_set(val_source_dependent, score_columns_dict)
test_source_dependent_expanded = expand_essays_to_scores_by_set(test_source_dependent, score_columns_dict)


train_narrative_expanded = expand_essays_to_scores_by_set(train_narrative, score_columns_dict)
val_narrative_expanded = expand_essays_to_scores_by_set(val_narrative, score_columns_dict)
test_narrative_expanded = expand_essays_to_scores_by_set(test_narrative, score_columns_dict)



In [32]:
print(train_persuasive_expanded.head(6))

      essay_id  essay_set                                              essay  \
2022      3217          2  Whispers and comments fill the room while peop...   
2022      3217          2  Whispers and comments fill the room while peop...   
737        740          1  Dear @CAPS1 @CAPS2, I think that children are ...   
737        740          1  Dear @CAPS1 @CAPS2, I think that children are ...   
2960      4155          2  I think that censorship in libraries is and is...   
2960      4155          2  I think that censorship in libraries is and is...   

      rater1_domain1  rater2_domain1  domain1_score  rater1_domain2  \
2022             4.0             4.0            4.0             4.0   
2022             4.0             4.0            NaN             4.0   
737              4.0             NaN            8.0             NaN   
737              NaN             4.0            8.0             NaN   
2960             3.0             3.0            3.0             2.0   
2960         

It is important that the dataframe include the scoring criteria and range for each combination of essay_set and score_type.

In [34]:
def assign_criteria_and_ranges(df):
    for index, row in df.iterrows():
        essay_set = row['essay_set']
        score_type = row['score_type']

        # Initialize default values
        criteria = "Not specified"
        score_range = (None, None)

        # Define conditions based on essay set (and sometimes score types)
        if essay_set == 1:
            criteria = """'Scoring is based on the development and support of positions, organization, fluency, and awareness of audience.
Scores:
1 - An undeveloped response that may take a position but offers no more than very minimal support. Typical elements: Contains few or vague details. Is awkward and fragmented. May be difficult to read and understand. May show no awareness of audience.
2 - An under-developed response that may or may not take a position. Typical elements: Contains only general reasons with unelaborated and/or list-like details. Shows little or no evidence of organization. May be awkward and confused or simplistic. May show little awareness of audience.
3 - A minimally-developed response that may take a position, but with inadequate support and details. Typical elements: Has reasons with minimal elaboration and more general than specific details. Shows some organization. May be awkward in parts with few transitions. Shows some awareness of audience.
4 - A somewhat-developed response that takes a position and provides adequate support. Typical elements: Has adequately elaborated reasons with a mix of general and specific details. Shows satisfactory organization. May be somewhat fluent with some transitional language. Shows adequate awareness of audience.
5 - A developed response that takes a clear position and provides reasonably persuasive support. Typical elements: Has moderately well elaborated reasons with mostly specific details. Exhibits generally strong organization. May be moderately fluent with transitional language throughout. May show a consistent awareness of audience.'
6 - A well-developed response that takes a clear and thoughtful position and provides persuasive support. Typical elements: Has fully elaborated reasons with specific details. Exhibits strong organization. Is fluent and uses sophisticated transitional language. May show a heightened awareness of audience."""
            score_range = (1, 6)
        elif essay_set == 2:
            if score_type == 'domain1_score':
                criteria = """Writing Applications:
Ideas and Content: Does the writing sample fully accomplish the task (e.g., support an opinion, summarize, tell a story, or write an article)? Does the writing sample include thorough, relevant, and complete ideas?
Organization: Are the ideas in the writing sample organized logically?
Style: Does the writing sample exhibit exceptional word usage? Does the writing sample demonstrate exceptional writing technique?
Voice: Does the writing sample demonstrate effective adjustment of language and tone to task and reader?
Scores:
1 - A performance that fails to accomplish the task. It exhibits considerable difficulty in areas of development, organization, and writing style. The writing is generally either very brief or rambling and repetitive, sometimes resulting in a response that may be difficult to read or comprehend.
2 - A performance that only partially accomplishes the task. Some responses may exhibit difficulty maintaining a focus. Others may be too brief to provide sufficient development of the topic or evidence of adequate organizational or writing style.
3 - A performance that minimally accomplishes the task. Some elements of development, organization, and writing style are weak.
4 - A good performance. It accomplishes the task, but generally needs to exhibit more development, better organization, or a more sophisticated writing style to receive a higher score.
5 - A solid performance. It fully accomplishes the task, but lacks the overall level of sophistication and consistency of a Score Point 6 paper.
6 - Rare. A performance that fully accomplishes the task in a thorough and insightful manner and has a distinctive quality that sets it apart as an outstanding performance."""
                score_range = (1, 6)
            elif score_type == 'domain2_score':
                criteria = """Language Conventions:
The writing sample's control of language skills, including grammar, spelling, punctuation, and overall sentence structure, as related to the Indiana Academic Standards."
Scores:
1 - The writing sample exhibits a minimal or less than minimal control of language skills. Errors are serious and numerous. The reader may need to stop and reread part of the sample and may struggle to discern the writer’s meaning.
2 - The writing sample exhibits a fair control of language skills. Errors are typically frequent and may occasionally impede the flow of communication.
3 - The writing sample exhibits a good control of language skills. Errors are occasional and are often of the first-draft variety; they have a minor impact on the flow of communication.
4 - The writing sample exhibits a superior command of language skills and written English language conventions. A Score Point 4 paper provides evidence that the student has a thorough control of the concepts outlined in the Indiana Academic Standards associated with the student’s grade level. There are no errors that impair the flow of communication. Errors are generally of the first-draft variety or occur when the student attempts sophisticated sentence construction."""
                score_range = (1, 4)
        elif essay_set == 3:
            criteria = """The response should address how the setting influences the cyclist's experiences, focusing on the interplay between the environment and the cyclist's decisions and emotions.
0 - The response is completely irrelevant or incorrect, or there is no response.',
1 - The response shows evidence of a minimal understanding of the text. May show evidence that some meaning has been derived from the text. May indicate a misreading of the text or the question. May lack information or explanation to support an understanding of the text in relation to the question.
2 - The response demonstrates a partial or literal understanding of the text. Addresses the demands of the question, although may not develop all parts equally. Uses some expressed or implied information from the text to demonstrate understanding. May not fully connect the support to a conclusion or assertion made about the text(s).
3 - The response demonstrates an understanding of the complexities of the text. Addresses the demands of the question. Uses expressed and implied information from the text. Clarifies and extends understanding beyond the literal."""
            score_range = (0, 3)
        elif essay_set == 4:
            criteria = """The response should explore the thematic significance of the concluding paragraph, relating it to the broader narrative and character development.
0 - The response is completely irrelevant or incorrect, or there is no response.',
1 - The response shows evidence of a minimal understanding of the text. May show evidence that some meaning has been derived from the text. May indicate a misreading of the text or the question. May lack information or explanation to support an understanding of the text in relation to the question.
2 - The response demonstrates a partial or literal understanding of the text. Addresses the demands of the question, although may not develop all parts equally. Uses some expressed or implied information from the text to demonstrate understanding. May not fully connect the support to a conclusion or assertion made about the text(s).
3 - The response demonstrates an understanding of the complexities of the text. Addresses the demands of the question. Uses expressed and implied information from the text. Clarifies and extends understanding beyond the literal."""
            score_range = (0, 3)
        elif essay_set == 5:
            criteria = """The response should effectively capture and articulate the mood created by the author, utilizing specific details and instances from the text to support the description.
Scores:
0 - The response is incorrect or irrelevant or contains insufficient information to demonstrate comprehension.',
1 - The response is a minimal description of the mood created by the author. The response includes little or no information from the memoir and may include misinterpretations. OR The response relates minimally to the task.'
2 - The response is a partial description of the mood created by the author. The response includes limited information from the memoir and may include misinterpretations.
3 - The response is a mostly clear, complete, and accurate description of the mood created by the author. The response includes relevant but often general information from the memoir.
4 - The response is a clear, complete, and accurate description of the mood created by the author. The response includes relevant and specific information from the memoir."""
            score_range = (0, 4)
        elif essay_set == 6:
            criteria = """The response should detail the technical and logistical challenges involved in the Empire State Building's mooring mast project, drawing directly from the text to support the analysis.
Scores:
0 - The response is totally incorrect or irrelevant, or contains insufficient evidence to demonstrate comprehension.
1 - The response is a minimal description of the obstacles the builders of the Empire State Building faced in attempting to allow dirigibles to dock there. The response includes little or no information from the excerpt and may include misinterpretations. OR The response relates minimally to the task.
2 - The response is a partial description of the obstacles the builders of the Empire State Building faced in attempting to allow dirigibles to dock there. The response includes limited information from the excerpt and may include misinterpretations.
3 - The response is a mostly clear, complete, and accurate description of the obstacles the builders of the Empire State Building faced in attempting to allow dirigibles to dock there. The response includes relevant but often general information from the excerpt.
4 - The response is a clear, complete, and accurate description of the obstacles the builders of the Empire State Building faced in attempting to allow dirigibles to dock there. The response includes relevant and specific information from the excerpt.'"""
            score_range = (0, 4)
        elif essay_set == 7:
            # Similar setup as shown previously for traits
            if score_type in ['rater1_trait1', 'rater2_trait1']:
                criteria = """
Scores:
0 - Ideas are not focused on the task and/or are undeveloped.
1 - Tells a story with ideas that are minimally focused on the topic and developed with limited and/or general details.
2 - Tells a story with ideas that are somewhat focused on the topic and are developed with a mix of specific and/or general details.
3 - Tells a story with ideas that are clearly focused on the topic and are thoroughly developed with specific, relevant details."""
                score_range = (0, 3)
            elif score_type in ['rater1_trait2', 'rater2_trait2']:
                criteria = """
Scores:
0: - No organization evident.
1 - Organization and connections between ideas and/or events are weak.
2 - Organization and connections between ideas and/or events are logically sequenced.
3 - Organization and connections between ideas and/or events are clear and logically sequenced."""
                score_range = (0, 3)
            elif score_type in ['rater1_trait3', 'rater2_trait3']:
                criteria = """
Scores:
0 - Ineffective use of language for the writer's purpose and audience.
1 - Limited use of language, including lack of variety in word choice and sentences, may hinder support for the writer's purpose and audience.
2 - Adequate command of language, including effective word choice and clear sentences, supports the writer's purpose and audience.
3 - Command of language, including effective and compelling word choice and varied sentence structure, clearly supports the writer's purpose and audience."""
                score_range = (0, 3)
            elif score_type in ['rater1_trait4', 'rater2_trait4']:
                criteria = """
Scores:
0 - Ineffective use of conventions of Standard English for grammar, usage, spelling, capitalization, and punctuation.
1 - Limited use of conventions of Standard English for grammar, usage, spelling, capitalization, and punctuation for the grade level.
2 - Adequate use of conventions of Standard English for grammar, usage, spelling, capitalization, and punctuation for the grade level.
3 - Consistent, appropriate use of conventions of Standard English for grammar, usage, spelling, capitalization, and punctuation for the grade level.'"""
                score_range = (0, 3)
        elif essay_set == 8:
            if score_type in ['rater1_trait1', 'rater2_trait1']:
                criteria = """Ideas and Content: Assesses the clarity, focus, and development of the ideas presented.
Scores:
1 - The writing lacks a central idea or purpose.
2 - Main ideas and purpose are somewhat unclear or development is attempted but minimal.
3 - The reader can understand the main ideas, although they may be overly broad or simplistic, and the results may not be effective. Supporting detail is often limited, insubstantial, overly general, or occasionally slightly off-topic.
4 - The writing is clear and focused. The reader can easily understand the main ideas. Support is present, although it may be limited or rather general.'
5 - The writing is clear, focused and interesting. It holds the reader’s attention. Main ideas stand out and are developed by supporting details suitable to audience and purpose.
6 - The writing is exceptionally clear, focused, and interesting. It holds the reader’s attention throughout. Main ideas stand out and are developed by strong support and rich details suitable to audience and purpose."""
                score_range = (0, 6)
            elif score_type in ['rater1_trait2', 'rater2_trait2']:
                criteria = """Organization: Evaluates the logical structure and the coherence of the writing.
Scores:
1 - The writing lacks coherence; organization seems haphazard and disjointed. Even after rereading, the reader remains confused.
2 - The writing lacks a clear organizational structure. An occasional organizational device is discernible; however, the writing is either difficult to follow and the reader has to reread substantial portions, or the piece is simply too short to demonstrate organizational skills.
3 - An attempt has been made to organize the writing; however, the overall structure is inconsistent or skeletal.
4 - Organization is clear and coherent. Order and structure are present, but may seem formulaic.
5 - The organization enhances the central idea(s) and its development. The order and structure are strong and move the reader through the text.
6 - The organization enhances the central idea(s) and its development. The order and structure are compelling and move the reader through the text easily."""
                score_range = (0, 6)
            elif score_type in ['rater1_trait5', 'rater2_trait5']:
                criteria = """Sentence fluency: Assesses the flow and rhythm of the writing, along with the variety and structure of sentences.
Scores:
1 - The writing is difficult to follow or to read aloud. Sentences tend to be incomplete, rambling, or very awkward.
2 - The writing tends to be either choppy or rambling. Awkward constructions often force the reader to slow down or reread.
3 - The writing tends to be mechanical rather than fluid. Occasional awkward constructions may force the reader to slow down or reread.
4 - The writing flows; however, connections between phrases or sentences may be less than fluid. Sentence patterns are somewhat varied, contributing to ease in oral reading.
5 - The writing has an easy flow and rhythm. Sentences are carefully crafted, with strong and varied structure that makes expressive oral reading easy and enjoyable.
6 - The writing has an effective flow and rhythm. Sentences show a high degree of craftsmanship, with consistently strong and varied structure that makes expressive oral reading easy and enjoyable."""
                score_range = (0, 6)
            elif score_type in ['rater1_trait6', 'rater2_trait6']:
                criteria = """Conventions: Evaluates the writer's use of standard writing conventions such as grammar, punctuation, spelling, and capitalization."
Scores:
1 - Numerous errors in usage, spelling, capitalization, and punctuation repeatedly distract the reader and make the text difficult to read. In fact, the severity and frequency of errors are so overwhelming that the reader finds it difficult to focus on the message and must reread for meaning.
2 - The writing demonstrates little control of standard writing conventions. Frequent, significant errors impede readability.
3 - The writing demonstrates limited control of standard writing conventions (e.g., punctuation, spelling, capitalization, grammar and usage). Errors begin to impede readability.
4 - The writing demonstrates control of standard writing conventions (e.g., punctuation, spelling, capitalization, grammar and usage). Significant errors do not occur frequently. Minor errors, while perhaps noticeable, do not impede readability.
5 - The writing demonstrates strong control of standard writing conventions (e.g., punctuation, spelling, capitalization, grammar and usage) and uses them effectively to enhance communication. Errors are few and minor. Conventions support readability.
6 - The writing demonstrates exceptionally strong control of standard writing conventions (e.g., punctuation, spelling, capitalization, grammar and usage) and uses them effectively to enhance communication. Errors are so few and so minor that the reader can easily skim right over them unless specifically searching for them."""
                score_range = (0, 6)

        # Assign criteria and range to the DataFrame
        df.at[index, 'actual_criteria'] = criteria
        df.at[index, 'actual_range'] = f"{score_range[0]}-{score_range[1]}"

    return df

train_persuasive_expanded.reset_index(drop=True, inplace=True) # This is critical to avoid mistakes
train_persuasive_expanded_manual = assign_criteria_and_ranges(train_persuasive_expanded)



   essay_id  essay_set                                              essay  \
0      3217          2  Whispers and comments fill the room while peop...   
1      3217          2  Whispers and comments fill the room while peop...   
2       740          1  Dear @CAPS1 @CAPS2, I think that children are ...   
3       740          1  Dear @CAPS1 @CAPS2, I think that children are ...   
4      4155          2  I think that censorship in libraries is and is...   
5      4155          2  I think that censorship in libraries is and is...   

   rater1_domain1  rater2_domain1  domain1_score  rater1_domain2  \
0             4.0             4.0            4.0             4.0   
1             4.0             4.0            NaN             4.0   
2             4.0             NaN            8.0             NaN   
3             NaN             4.0            8.0             NaN   
4             3.0             3.0            3.0             2.0   
5             3.0             3.0            NaN    

In [37]:
print(train_persuasive_expanded_manual[["score_type", "actual_criteria", "actual_range"]].head(6))

       score_type                                    actual_criteria  \
0   domain1_score  Writing Applications:\nIdeas and Content: Does...   
1   domain2_score  Language Conventions:\nThe writing sample's co...   
2  rater1_domain1  'Scoring is based on the development and suppo...   
3  rater2_domain1  'Scoring is based on the development and suppo...   
4   domain1_score  Writing Applications:\nIdeas and Content: Does...   
5   domain2_score  Language Conventions:\nThe writing sample's co...   

  actual_range  
0          1-6  
1          1-4  
2          1-6  
3          1-6  
4          1-6  
5          1-4  


In [35]:
# Looks good! Applying to the other dataframes

# Persuasive:
val_persuasive_expanded.reset_index(drop=True, inplace=True)
val_persuasive_expanded_manual = assign_criteria_and_ranges(val_persuasive_expanded)

test_persuasive_expanded.reset_index(drop=True, inplace=True)
test_persuasive_expanded_manual = assign_criteria_and_ranges(test_persuasive_expanded)

# Source-Dependent
train_source_dependent_expanded.reset_index(drop=True, inplace=True)
train_source_dependent_expanded_manual = assign_criteria_and_ranges(train_source_dependent_expanded)

val_source_dependent_expanded.reset_index(drop=True, inplace=True)
val_source_dependent_expanded_manual = assign_criteria_and_ranges(val_source_dependent_expanded)

test_source_dependent_expanded.reset_index(drop=True, inplace=True)
test_source_dependent_expanded_manual = assign_criteria_and_ranges(test_source_dependent_expanded)

# Narrative

train_narrative_expanded.reset_index(drop=True, inplace=True)
train_narrative_expanded_manual = assign_criteria_and_ranges(train_narrative_expanded)

val_narrative_expanded.reset_index(drop=True, inplace=True)
val_narrative_expanded_manual = assign_criteria_and_ranges(val_narrative_expanded)

test_narrative_expanded.reset_index(drop=True, inplace=True)
test_narrative_expanded_manual = assign_criteria_and_ranges(test_narrative_expanded)

In the final step of this data preparation, categorical data is converted into one-hot-encoding columns, and a text block with all the relevant information is constructed from the other columns.

Early on, I created a prompt dictionary that I didn't end up using--however, it will prove useful as we now concatenate the content of various columns.

In [38]:
def create_model_input(features_df, prompts_criteria):
    # OHC - rater/score type info
    features_df = pd.get_dummies(features_df, columns=['score_type'])
    # Initialize final model input column
    features_df['final_input'] = ""

    # Loop through each row
    for idx, row in features_df.iterrows():
        prompt = prompts_criteria[row['essay_set']]['prompt']  # Get prompt from dictionary
        essay = row['essay']
        actual_criteria = row['actual_criteria']
        actual_range = row['actual_range']

        # Adding backticks around the essay to clarify the essay is different from the prompt
        text_input = f"Prompt: {prompt}\n\n`{essay}`\n\n{actual_criteria}\nRange: {actual_range}."

        additional_features = f"\nGrade Level: {row['grade_level']}\nDale-Chall Score: {row['dale_chall_score']}\nComplexity Difference: {row['complexity_difference']}\nParts-of-Speech Proportions: {row['pos_proportions']}\nDependencies Proportions: {row['dependency_proportions']}\nCharacter Count (Scaled): {row['character_count_scaled']}\nWord Count (Scaled): {row['word_count_scaled']}\nError Count (Scaled): {row['error_count_scaled']}\nError to Word Ratio (Scaled): {row['error_to_word_ratio_scaled']}"

        # Completing model input for row
        features_df.at[idx, 'final_input'] = text_input + additional_features

    return features_df

# Dictionary that includes the relevant information

prompts_criteria = {
    1: {
        "prompt": """More and more people use computers, but not everyone agrees that this benefits society. Those who support advances in technology believe that computers have a positive effect on people. They teach hand-eye coordination, give people the ability to learn about faraway places and people, and even allow people to talk online with other people. Others have different ideas. Some experts are concerned that people are spending too much time on their computers and less time exercising, enjoying nature, and interacting with family and friends.
Write a letter to your local newspaper in which you state your opinion on the effects computers have on people. Persuade the readers to agree with you.""",
        "criteria": {
            "description": """Scoring is based on the development and support of positions, organization, fluency, and awareness of audience.""",
            "scores": {
                1: "An undeveloped response that may take a position but offers no more than very minimal support. Typical elements: Contains few or vague details. Is awkward and fragmented. May be difficult to read and understand. May show no awareness of audience.",
                2: "An under-developed response that may or may not take a position. Typical elements: Contains only general reasons with unelaborated and/or list-like details. Shows little or no evidence of organization. May be awkward and confused or simplistic. May show little awareness of audience.",
                3: "A minimally-developed response that may take a position, but with inadequate support and details. Typical elements: Has reasons with minimal elaboration and more general than specific details. Shows some organization. May be awkward in parts with few transitions. Shows some awareness of audience.",
                4: "A somewhat-developed response that takes a position and provides adequate support. Typical elements: Has adequately elaborated reasons with a mix of general and specific details. Shows satisfactory organization. May be somewhat fluent with some transitional language. Shows adequate awareness of audience.",
                5: "A developed response that takes a clear position and provides reasonably persuasive support. Typical elements: Has moderately well elaborated reasons with mostly specific details. Exhibits generally strong organization. May be moderately fluent with transitional language throughout. May show a consistent awareness of audience.",
                6: "A well-developed response that takes a clear and thoughtful position and provides persuasive support. Typical elements: Has fully elaborated reasons with specific details. Exhibits strong organization. Is fluent and uses sophisticated transitional language. May show a heightened awareness of audience."
            }
        },
        "range": (1, 6)
    },
    2: {
        "prompt": """\"All of us can think of a book that we hope none of our children or any other children have taken off the shelf. But if I have the right to remove that book from the shelf -- that work I abhor -- then you also have exactly the same right and so does everyone else. And then we have no books left on the shelf for any of us.\" --Katherine Paterson, Author. Write a persuasive essay to a newspaper reflecting your views on censorship in libraries. Do you believe that certain materials, such as books, music, movies, magazines, etc., should be removed from the shelves if they are found offensive? Support your position with convincing arguments from your own experience, observations, and/or reading.""",
        "criteria": {
            "domain1": {
                "description": """Writing Applications:
Ideas and Content: Does the writing sample fully accomplish the task (e.g., support an opinion, summarize, tell a story, or write an article)? Does the writing sample include thorough, relevant, and complete ideas?
Organization: Are the ideas in the writing sample organized logically?
Style: Does the writing sample exhibit exceptional word usage? Does the writing sample demonstrate exceptional writing technique?
Voice: Does the writing sample demonstrate effective adjustment of language and tone to task and reader?""",
                "scores": {
                    1: "A performance that fails to accomplish the task. It exhibits considerable difficulty in areas of development, organization, and writing style. The writing is generally either very brief or rambling and repetitive, sometimes resulting in a response that may be difficult to read or comprehend.",
                    2: "A performance that only partially accomplishes the task. Some responses may exhibit difficulty maintaining a focus. Others may be too brief to provide sufficient development of the topic or evidence of adequate organizational or writing style.",
                    3: "A performance that minimally accomplishes the task. Some elements of development, organization, and writing style are weak.",
                    4: "A a good performance. It accomplishes the task, but generally needs to exhibit more development, better organization, or a more sophisticated writing style to receive a higher score.",
                    5: "A solid performance. It fully accomplishes the task, but lacks the overall level of sophistication and consistency of a Score Point 6 paper.",
                    6: "Rare. A performance that fully accomplishes the task in a thorough and insightful manner and has a distinctive quality that sets it apart as an outstanding performance."
                }
            },
            "domain2": {
                "description": """Language Conventions:
The writing sample's control of language skills, including grammar, spelling, punctuation, and overall sentence structure, as related to the Indiana Academic Standards.""",
                "scores": {
                    1: "The writing sample exhibits a minimal or less than minimal control of language skills. Errors are serious and numerous. The reader may need to stop and reread part of the sample and may struggle to discern the writer’s meaning.",
                    2: "The writing sample exhibits a fair control of language skills. Errors are typically frequent and may occasionally impede the flow of communication.",
                    3: "The writing sample exhibits a good control of language skills. Errors are occasional and are often of the first-draft variety; they have a minor impact on the flow of communication.",
                    4: "The writing sample exhibits a superior command of language skills and written English language conventions. A Score Point 4 paper provides evidence that the student has a thorough control of the concepts outlined in the Indiana Academic Standards associated with the student’s grade level. There are no errors that impair the flow of communication. Errors are generally of the first-draft variety or occur when the student attempts sophisticated sentence construction."
                }
            }
        },
        "range": {
            "domain1": (1, 6),
            "domain2": (1, 4)
        }
    },
    3: {
        "prompt": """Read the story "Rough Road Ahead" by Joe Kurmaskie. Note: This is a concise summary of the full text that students are expected to read. The full story provides more detail and context, which is crucial for a comprehensive understanding and response.
The story follows Joe Kurmaskie, a solo cyclist, who finds himself in a challenging situation due to misguided advice from a group of elderly locals he encountered at a campground near Lodi, California. The old men recommend a "shortcut" to Yosemite National Park, which Joe decides to take the following morning. This route initially seems promising but quickly leads him into difficulty.
Joe's journey takes him to a ghost town, where he begins to doubt the accuracy of the old men's directions. As he continues, the terrain becomes more demanding, and he struggles with heat and inadequate water supplies. He reaches a dilapidated water pump in a deserted area, only to find that the water it produces is nearly undrinkable.
Pushing forward, Joe encounters increasingly rough roads and experiences severe dehydration, which is exacerbated by the intense California heat. His situation becomes dire as he realizes the supposed nearby town is much farther away than he had been led to believe.
In a twist of irony, Joe comes across an abandoned building, which turns out to be a former Welch’s Grape Juice factory—a mocking reminder of his thirst. Driven by desperation, he sucks on pebbles to stimulate saliva production and mitigate his thirst.
Eventually, Joe reaches a fish camp where he finally finds relief and water. The story concludes with Joe reflecting on his ordeal and resolving to rely solely on his own map in the future, rather than trusting dubious advice.
Write a response that explains how the features of the setting affect the cyclist. In your response, include examples from the essay that support your conclusion.""",
        "criteria": {
            "description": "The response should address how the setting influences the cyclist's experiences, focusing on the interplay between the environment and the cyclist's decisions and emotions.",
            "scores": {
                0: "The response is completely irrelevant or incorrect, or there is no response.",
                1: "The response shows evidence of a minimal understanding of the text. May show evidence that some meaning has been derived from the text. May indicate a misreading of the text or the question. May lack information or explanation to support an understanding of the text in relation to the question.",
                2: "The response demonstrates a partial or literal understanding of the text. Addresses the demands of the question, although may not develop all parts equally. Uses some expressed or implied information from the text to demonstrate understanding. May not fully connect the support to a conclusion or assertion made about the text(s).",
                3: "The response demonstrates an understanding of the complexities of the text. Addresses the demands of the question. Uses expressed and implied information from the text. Clarifies and extends understanding beyond the literal."
            },
            "range": (0, 3)
        }
    },
    4: {
        "prompt": """Read the excerpt of "The Winter Hibiscus" by Minfong Ho. Note: This is a concise summary of the full excerpt that students are expected to read. The full excerpt provides more detail and context, which is crucial for a comprehensive understanding and response.
Saeng, a teenage girl, and her family have moved to the United States from Vietnam. As Saeng walks home after failing her driver's test, she sees a familiar plant. Later, she goes to a florist shop to see if the plant can be purchased.
Saeng, a character deeply connected to her cultural roots, experiences a poignant moment of nostalgia and loss during a visit to a greenhouse. Surrounded by familiar plants from her childhood, she is particularly drawn to a hibiscus, reminiscent of the vibrant flowers from her youth in her home country. This single blossom evokes vivid memories of her family's garden, the rituals she performed there, and the natural beauty that once surrounded her.
The narrative weaves through her emotional journey as she encounters other familiar plants, each stirring memories of home and her past life, especially the jasmine plant, which brings back specific memories of her grandmother. The sensory details of the plants' sights and smells trigger a cascade of emotions, leading her to purchase a hibiscus plant, despite its high cost, as a way to hold onto her past.
At home, her interaction with her mother reveals another layer of her struggle: she has failed an important test, adding to her sense of loss. However, the story ends on a note of hopeful resilience. Saeng plants the hibiscus in her new environment, symbolizing her attempt to root herself in new soil while preserving connections to her heritage. She resolves to retake the test, inspired by the cyclic return of the geese, signaling renewal and the continuation of life's cycles.
Read the last paragraph of the story.
\"When they come back, Saeng vowed silently to herself, in the spring, when the snows melt and the geese return and this hibiscus is budding, then I will take that test again.\"
Write a response that explains why the author concludes the story with this paragraph. In your response, include details and examples from the story that support your ideas.""",
        "criteria": {
            "description": "The response should explore the thematic significance of the concluding paragraph, relating it to the broader narrative and character development.",
            "scores": {
                0: "The response is completely irrelevant or incorrect, or there is no response.",
                1: "The response shows evidence of a minimal understanding of the text. May show evidence that some meaning has been derived from the text. May indicate a misreading of the text or the question. May lack information or explanation to support an understanding of the text in relation to the question.",
                2: "The response demonstrates a partial or literal understanding of the text. Addresses the demands of the question, although may not develop all parts equally. Uses some expressed or implied information from the text to demonstrate understanding. May not fully connect the support to a conclusion or assertion made about the text(s).",
                3: "The response demonstrates an understanding of the complexities of the text. Addresses the demands of the question. Uses expressed and implied information from the text. Clarifies and extends understanding beyond the literal."
            },
            "range": (0, 3)
        }
    },
    5: {
        "prompt": """Read "Home: The Blueprints of Our Lives" by Narciso Rodriguez.
My parents, originally from Cuba, arrived in the United States in 1956. After living for a year in a furnished one-room apartment, twenty-one-year-old Rawedia Maria and twenty-seven-year-old Narciso Rodriguez, Sr., could afford to move into a modest, three-room apartment I would soon call home. In 1961, I was born into this simple house, situated in a two-family, blond-brick building in the Ironbound section of Newark, New Jersey. Within its walls, my young parents created our traditional Cuban home, the very heart of which was the kitchen. My parents both shared cooking duties and unwittingly passed on to me their rich culinary skills and a love of cooking that is still with me today (and for which I am eternally grateful). Passionate Cuban music (which I adore to this day) filled the air, mixing with the aromas of the kitchen. Here, the innocence of childhood, the congregation of family and friends, and endless celebrations that encompassed both, formed the backdrop to life in our warm home. Growing up in this environment instilled in me a great sense that “family” had nothing to do with being a blood relative. Quite the contrary, our neighborhood was made up of mostly Spanish, Cuban, and Italian immigrants at a time when overt racism was the norm and segregation prevailed in the United States. In our neighborhood, despite customs elsewhere, all of these cultures came together in great solidarity and friendship. It was a close-knit community of honest, hardworking immigrants who extended a hand to people who, while not necessarily their own kind, were clearly in need. Our landlord and his daughter, Alegria (my babysitter and first friend), lived above us, and Alegria graced our kitchen table for meals more often than not. Also at the table were Sergio and Edelmira, my surrogate grandparents who lived in the basement apartment. (I would not know my “real” grandparents, Narciso the Elder and Consuelo, until 1970 when they were allowed to leave Cuba.) My aunts Bertha and Juanita and my cousins Arnold, Maria, and Rosemary also all lived nearby and regularly joined us at our table. Countless extended family members came and went — and there was often someone staying with us temporarily until they were able to get back on their feet. My parents always kept their arms and their door open to the many people we considered family, knowing that they would do the same for us.
Describe the mood created by the author in the memoir. Support your answer with relevant and specific information from the memoir.""",
        "criteria": {
            "description": "The response should effectively capture and articulate the mood created by the author, utilizing specific details and instances from the text to support the description.",
            "scores": {
                0: "The response is incorrect or irrelevant or contains insufficient information to demonstrate comprehension.",
                1: "The response is a minimal description of the mood created by the author. The response includes little or no information from the memoir and may include misinterpretations. OR The response relates minimally to the task.",
                2: "The response is a partial description of the mood created by the author. The response includes limited information from the memoir and may include misinterpretations.",
                3: "The response is a mostly clear, complete, and accurate description of the mood created by the author. The response includes relevant but often general information from the memoir.",
                4: "The response is a clear, complete, and accurate description of the mood created by the author. The response includes relevant and specific information from the memoir."
            },
            "range": (0, 4)
        }
    },
    6: {
        "prompt": """Read "The Mooring Mast" by Marcia Amidon Lusted. Note: This is a concise summary of the full text that students are expected to read. The full text provides more detail and context, which is crucial for a comprehensive understanding and response.
        "The Mooring Mast” discusses how The Empire State Building, originally envisioned to be the world's tallest building, was conceived during a time of intense architectural rivalry, particularly with the Chrysler Building, which was under construction at the same time. The Chrysler Building's architect added a secret 185-foot spire, temporarily claiming the title for the tallest structure. This move spurred the Empire State Building's planners, led by former New York Governor Al Smith, to push the design even further, ultimately setting its new height at 1,250 feet.
The increased height included an ambitious plan for a mooring mast at the top of the building, which was intended to serve as a docking station for dirigibles, or zeppelins. This feature was inspired by the growing interest in airship travel in the 1920s, which was seen as the future of transatlantic transportation. The dirigibles, large airships powered by engines and capable of carrying passengers in a gondola below, could theoretically dock at the Empire State Building, allowing passengers to embark and disembark right in the heart of Manhattan.
The idea was that the Empire State Building would not just be a static office building, but a dynamic transportation hub. It was to be equipped with facilities to handle airship passengers, including customs and ticketing areas on the 86th and 101st floors, effectively integrating the building into the emerging global travel network.
However, the practical realization of this vision faced numerous challenges. The architects and engineers had to consider how to securely anchor a thousand-foot-long dirigible without it posing a risk to the building and the city below. This required substantial modifications to the building's structure to handle the added stress of a moored airship, particularly in managing the forces exerted by winds at such a height.
Despite the meticulous planning and consultation with experts, including tours of naval airship operations, the idea faced insurmountable hurdles. The greatest obstacle was the inherent dangers posed by the flammable hydrogen gas used by most dirigibles (helium, a safer alternative, was in scarce supply). The tragic destruction of the Hindenburg in 1937 underscored the potential risks of docking airships in densely populated areas and ultimately sounded the death knell for the mooring mast concept.
Additionally, natural factors such as the unpredictable and often violent winds at the top of the building made mooring safely a near impossibility. Legal restrictions on low-flying aircraft over populated areas further complicated the situation. While there were attempts to use the mast—like the Goodyear blimp Columbia's stunt of delivering newspapers—the practical use of the mooring mast for airship docking was never realized.
By the late 1930s, with the rapid advancements in airplane technology, the age of the dirigible was coming to an end, rendering the mooring mast obsolete. The areas of the building designated for airship passengers were eventually repurposed for public use, including a high-altitude soda fountain and tea garden. The open observation deck, which was intended for airship passengers, has remained closed to the public.
In conclusion, the mooring mast of the Empire State Building stands as a fascinating example of ambitious architectural planning that failed to materialize. It reflects the optimism and forward-thinking of an era that envisioned a future of airships integrating with the urban landscape. While the mast was never used for its intended purpose, it remains an iconic part of the New York City skyline and a testament to the audacious dreams of its creators. The story of the Empire State Building’s mooring mast is a poignant reminder of the limits of contemporary technology and the unpredictability of progress.
Based on the excerpt, describe the obstacles the builders of the Empire State Building faced in attempting to allow dirigibles to dock there. Support your answer with relevant and specific information from the excerpt.""",
        "criteria": {
            "description": "The response should detail the technical and logistical challenges involved in the Empire State Building's mooring mast project, drawing directly from the text to support the analysis.",
            "scores": {
                0: "The response is totally incorrect or irrelevant, or contains insufficient evidence to demonstrate comprehension.",
                1: "The response is a minimal description of the obstacles the builders of the Empire State Building faced in attempting to allow dirigibles to dock there. The response includes little or no information from the excerpt and may include misinterpretations. OR The response relates minimally to the task.",
                2: "The response is a partial description of the obstacles the builders of the Empire State Building faced in attempting to allow dirigibles to dock there. The response includes limited information from the excerpt and may include misinterpretations.",
                3: "The response is a mostly clear, complete, and accurate description of the obstacles the builders of the Empire State Building faced in attempting to allow dirigibles to dock there. The response includes relevant but often general information from the excerpt.",
                4: "The response is a clear, complete, and accurate description of the obstacles the builders of the Empire State Building faced in attempting to allow dirigibles to dock there. The response includes relevant and specific information from the excerpt."
            },
            "range": (0, 4)
        }
    },
    7: {
        "prompt": """Write about patience. Being patient means that you are understanding and tolerant. A patient person experiences difficulties without complaining. Do only one of the following: write a story about a time when you were patient OR write a story about a time when someone you know was patient OR write a story in your own way about patience.""",
        "criteria": {
            "description": "The response should effectively communicate a narrative about patience, highlighting the subject's ability to endure challenges calmly and without complaint.",
            "scores": {
                "ideas": {
                    0: "Ideas are not focused on the task and/or are undeveloped.",
                    1: "Tells a story with ideas that are minimally focused on the topic and developed with limited and/or general details.",
                    2: "Tells a story with ideas that are somewhat focused on the topic and are developed with a mix of specific and/or general details.",
                    3: "Tells a story with ideas that are clearly focused on the topic and are thoroughly developed with specific, relevant details."
                },
                "organization": {
                    0: "No organization evident.",
                    1: "Organization and connections between ideas and/or events are weak.",
                    2: "Organization and connections between ideas and/or events are logically sequenced.",
                    3: "Organization and connections between ideas and/or events are clear and logically sequenced."
                },
                "style": {
                    0: "Ineffective use of language for the writer's purpose and audience.",
                    1: "Limited use of language, including lack of variety in word choice and sentences, may hinder support for the writer's purpose and audience.",
                    2: "Adequate command of language, including effective word choice and clear sentences, supports the writer's purpose and audience.",
                    3: "Command of language, including effective and compelling word choice and varied sentence structure, clearly supports the writer's purpose and audience."
                },
                "conventions": {
                    0: "Ineffective use of conventions of Standard English for grammar, usage, spelling, capitalization, and punctuation.",
                    1: "Limited use of conventions of Standard English for grammar, usage, spelling, capitalization, and punctuation for the grade level.",
                    2: "Adequate use of conventions of Standard English for grammar, usage, spelling, capitalization, and punctuation for the grade level.",
                    3: "Consistent, appropriate use of conventions of Standard English for grammar, usage, spelling, capitalization, and punctuation for the grade level."
                }
            },
            "range": {
                "ideas": (0, 3),
                "organization": (0, 3),
                "style": (0, 3),
                "conventions": (0, 3)
            }        }
    },
    8: {
        "prompt": """We all understand the benefits of laughter. For example, someone once said, “Laughter is the shortest distance between two people.” Many other people believe that laughter is an important part of any relationship. Tell a true story in which laughter was one element or part.""",
        "criteria": {
            "description": "The response should narrate an event where laughter played a key role, illustrating its impact on relationships and personal connections.",
            "scores": {
                "ideas_and_content": {
                    "description": "Assesses the clarity, focus, and development of the ideas presented.",
                    "scores": {
                        1: "The writing lacks a central idea or purpose.",
                        2: "Main ideas and purpose are somewhat unclear or development is attempted but minimal.",
                        3: "The reader can understand the main ideas, although they may be overly broad or simplistic, and the results may not be effective. Supporting detail is often limited, insubstantial, overly general, or occasionally slightly off-topic.",
                        4: "The writing is clear and focused. The reader can easily understand the main ideas. Support is present, although it may be limited or rather general.",
                        5: "The writing is clear, focused and interesting. It holds the reader’s attention. Main ideas stand out and are developed by supporting details suitable to audience and purpose.",
                        6: "The writing is exceptionally clear, focused, and interesting. It holds the reader’s attention throughout. Main ideas stand out and are developed by strong support and rich details suitable to audience and purpose."
                    },
                },
                "organization": {
                    "description": "Evaluates the logical structure and the coherence of the writing.",
                    "scores": {
                        1: "The writing lacks coherence; organization seems haphazard and disjointed. Even after rereading, the reader remains confused.",
                        2: "The writing lacks a clear organizational structure. An occasional organizational device is discernible; however, the writing is either difficult to follow and the reader has to reread substantial portions, or the piece is simply too short to demonstrate organizational skills.",
                        3: "An attempt has been made to organize the writing; however, the overall structure is inconsistent or skeletal.",
                        4: "Organization is clear and coherent. Order and structure are present, but may seem formulaic.",
                        5: "The organization enhances the central idea(s) and its development. The order and structure are strong and move the reader through the text.",
                        6: "The organization enhances the central idea(s) and its development. The order and structure are compelling and move the reader through the text easily."
                    },
                },
                "sentence_fluency": {
                    "description": "Assesses the flow and rhythm of the writing, along with the variety and structure of sentences.",
                    "scores": {
                        1: "The writing is difficult to follow or to read aloud. Sentences tend to be incomplete, rambling, or very awkward.",
                        2: "The writing tends to be either choppy or rambling. Awkward constructions often force the reader to slow down or reread.",
                        3: "The writing tends to be mechanical rather than fluid. Occasional awkward constructions may force the reader to slow down or reread.",
                        4: "The writing flows; however, connections between phrases or sentences may be less than fluid. Sentence patterns are somewhat varied, contributing to ease in oral reading.",
                        5: "The writing has an easy flow and rhythm. Sentences are carefully crafted, with strong and varied structure that makes expressive oral reading easy and enjoyable.",
                        6: "The writing has an effective flow and rhythm. Sentences show a high degree of craftsmanship, with consistently strong and varied structure that makes expressive oral reading easy and enjoyable."
                    },
                },
                "conventions": {
                    "description": "Evaluates the writer's use of standard writing conventions such as grammar, punctuation, spelling, and capitalization.",
                    "scores": {
                        1: "Numerous errors in usage, spelling, capitalization, and punctuation repeatedly distract the reader and make the text difficult to read. In fact, the severity and frequency of errors are so overwhelming that the reader finds it difficult to focus on the message and must reread for meaning.",
                        2: "The writing demonstrates little control of standard writing conventions. Frequent, significant errors impede readability.",
                        3: "The writing demonstrates limited control of standard writing conventions (e.g., punctuation, spelling, capitalization, grammar and usage). Errors begin to impede readability.",
                        4: "The writing demonstrates control of standard writing conventions (e.g., punctuation, spelling, capitalization, grammar and usage). Significant errors do not occur frequently. Minor errors, while perhaps noticeable, do not impede readability.",
                        5: "The writing demonstrates strong control of standard writing conventions (e.g., punctuation, spelling, capitalization, grammar and usage) and uses them effectively to enhance communication. Errors are few and minor. Conventions support readability.",
                        6: "The writing demonstrates exceptionally strong control of standard writing conventions (e.g., punctuation, spelling, capitalization, grammar and usage) and uses them effectively to enhance communication. Errors are so few and so minor that the reader can easily skim right over them unless specifically searching for them."
                    },
                }
            },
            "range": {
                "ideas_and_content": (0, 6),
                "organization": (0, 6),
                "sentence_fluency": (0, 6),
                "conventions": (0, 6)
            }
        }
    }
}

train_persuasive_model_input = create_model_input(train_persuasive_expanded_manual, prompts_criteria)
eval_persuasive_model_input = create_model_input(val_persuasive_expanded_manual, prompts_criteria)
test_persuasive_model_input = create_model_input(test_persuasive_expanded_manual, prompts_criteria)

train_source_dependent_model_input = create_model_input(train_source_dependent_expanded_manual, prompts_criteria)
eval_source_dependent_model_input = create_model_input(val_source_dependent_expanded_manual, prompts_criteria)
test_source_dependent_model_input = create_model_input(test_source_dependent_expanded_manual, prompts_criteria)

train_narrative_model_input = create_model_input(train_narrative_expanded_manual, prompts_criteria)
eval_narrative_model_input = create_model_input(val_narrative_expanded_manual, prompts_criteria)
test_narrative_model_input = create_model_input(test_narrative_expanded_manual, prompts_criteria)


Now that the input dataframes are complete, we can write to csv (models will be trained in separate notebooks).

In [None]:
train_persuasive_model_input.to_csv('/content/drive/MyDrive/DATA698/model_data/train_persuasive_model_input.csv', index=False)
train_source_dependent_model_input.to_csv('/content/drive/MyDrive/DATA698/model_data/train_source_dependent_model_input.csv', index=False)
train_narrative_model_input.to_csv('/content/drive/MyDrive/DATA698/model_data/train_narrative_model_input.csv', index=False)

eval_persuasive_model_input.to_csv('/content/drive/MyDrive/DATA698/model_data/eval_persuasive_model_input.csv', index=False)
eval_source_dependent_model_input.to_csv('/content/drive/MyDrive/DATA698/model_data/eval_source_dependent_model_input.csv', index=False)
eval_narrative_model_input.to_csv('/content/drive/MyDrive/DATA698/model_data/eval_narrative_model_input.csv', index=False)

test_persuasive_model_input.to_csv('/content/drive/MyDrive/DATA698/model_data/test_persuasive_model_input.csv', index=False)
test_source_dependent_model_input.to_csv('/content/drive/MyDrive/DATA698/model_data/test_source_dependent_model_input.csv', index=False)
test_narrative_model_input.to_csv('/content/drive/MyDrive/DATA698/model_data/test_narrative_model_input.csv', index=False)