### Texas Data Project 
* Luke Hamm and Noah Husted

In [3]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics import precision_recall_fscore_support, accuracy_score
from sklearn.naive_bayes import MultinomialNB
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import Ridge

### Code Explanation: Experimental Evaluation of Text Classification

This code performs an experimental evaluation of classifiers on textual data with various preprocessing configurations. Below is a summary of the process:

1. **Data Preparation (`prepare_data`)**:
   - Loads and filters the dataset based on specific Sustainable Development Goals (SDGs) and quality thresholds.
   - Filters out entries that don't meet minimum agreement and label difference requirements.
   - Returns the text data (corpus) and corresponding labels.

2. **Classification (`run_classification`)**:
   - Transforms text data into numeric features using `CountVectorizer` or `TfidfVectorizer`.
   - Splits data into training and testing sets using `train_test_split`.
   - Trains a specified classifier (e.g., Naive Bayes, MLP, Logistic Regression) on the training set.
   - Tests the classifier on the test set and calculates evaluation metrics: precision, recall, F1-score, and accuracy.

3. **Configuration Evaluation (`evaluate_configurations`)**:
   - Runs multiple combinations of:
     - **Vectorizers**: Count or TF-IDF.
     - **N-gram ranges**: Unigrams, bigrams, or both.
     - **Classifiers**: Multinomial Naive Bayes, MLP, Logistic Regression.
   - Collects performance metrics for each configuration into a results table.

4. **Highlight Best Metrics**:
   - Identifies and bolds the best precision, recall, F1-score, and accuracy values across all configurations.

5. **Final Results**:
   - Outputs a summary table of performance metrics for all configurations, making it easy to identify the best combination of vectorizer, n-grams, and classifier.

This code is designed for analyzing the performance of classifiers on SDG-labeled textual data and determining the optimal preprocessing and classification setup.


In [7]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics import precision_recall_fscore_support, accuracy_score
from sklearn.naive_bayes import MultinomialNB
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression

def prepare_data(file_path, selected_sdgs=[2, 3, 4, 8, 9, 10, 14, 15, 16], min_agreement=0.5, min_label_diff=2):
    """
    Load and filter the dataset for specific SDGs and quality thresholds.
    """
    text_df = pd.read_csv(file_path, sep="\t", quotechar='"')
    text_df.drop(text_df.columns.values[0], axis=1, inplace=True)
    text_df = text_df.query(
        f"agreement > {min_agreement} and (labels_positive - labels_negative) > {min_label_diff}"
    ).reset_index(drop=True)
    
    text_df = text_df[text_df['sdg'].isin(selected_sdgs)].reset_index(drop=True)
    text_df['sdg'] = text_df['sdg'].astype(int)  # Ensure labels are integers
    
    return text_df['text'].astype(str), text_df['sdg']

def run_classification(corpus, labels, classifier_algorithm, vectorizer, ngram_range=(1, 1), min_df=5):
    """
    Run classification and return performance metrics.
    """
    vectorizer.set_params(ngram_range=ngram_range, min_df=min_df)
    X = vectorizer.fit_transform(corpus)
    X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.33, random_state=7)
    
    clf = classifier_algorithm.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    
    precision, recall, f1, _ = precision_recall_fscore_support(y_test, y_pred, average="weighted")
    accuracy = accuracy_score(y_test, y_pred)
    return precision, recall, f1, accuracy

def evaluate_configurations(corpus, labels):
    """
    Test different combinations of vectorizers, n-grams, and classifiers.
    """
    vectorizers = [
        ("Count", CountVectorizer()),
        ("TF-IDF", TfidfVectorizer())
    ]
    ngram_ranges = [(1, 1), (2, 2), (1, 2)]
    classifiers = [
        ("MultinomialNB", MultinomialNB()),
        ("MLP", MLPClassifier(max_iter=300)),
        ("LogisticRegression", LogisticRegression(max_iter=500, solver="liblinear", multi_class="ovr"))
    ]
    
    results = []
    for vec_name, vectorizer in vectorizers:
        for ngram_range in ngram_ranges:
            for clf_name, clf in classifiers:
                try:
                    precision, recall, f1, accuracy = run_classification(
                        corpus, labels, clf, vectorizer, ngram_range=ngram_range, min_df=10
                    )
                    results.append({
                        "Vectorizer": vec_name,
                        "N-Grams": ngram_range,
                        "Classifier": clf_name,
                        "Precision": precision,
                        "Recall": recall,
                        "F1-Score": f1,
                        "Accuracy": accuracy
                    })
                except Exception as e:
                    print(f"Error with {vec_name}, {ngram_range}, {clf_name}: {e}")
    results_df = pd.DataFrame(results)
    return results_df

data_dir = "/Users/lukeh/Downloads/"
text_file = "osdg-community-data-v2024-04-01.csv"

corpus, labels = prepare_data(data_dir + text_file, selected_sdgs=[8, 9, 10, 14])

results_df = evaluate_configurations(corpus, labels)

for metric in ["Precision", "Recall", "F1-Score", "Accuracy"]:
    max_val = results_df[metric].max()
    results_df[metric] = results_df[metric].apply(lambda x: f"**{x:.2f}**" if x == max_val else f"{x:.2f}")

print(results_df)


   Vectorizer N-Grams          Classifier Precision    Recall  F1-Score  \
0       Count  (1, 1)       MultinomialNB  **0.87**  **0.87**  **0.87**   
1       Count  (1, 1)                 MLP      0.85      0.86      0.85   
2       Count  (1, 1)  LogisticRegression      0.86      0.86      0.86   
3       Count  (2, 2)       MultinomialNB      0.80      0.80      0.80   
4       Count  (2, 2)                 MLP      0.76      0.77      0.76   
5       Count  (2, 2)  LogisticRegression      0.77      0.77      0.77   
6       Count  (1, 2)       MultinomialNB      0.86      0.86      0.86   
7       Count  (1, 2)                 MLP      0.86      0.87      0.86   
8       Count  (1, 2)  LogisticRegression      0.87      0.87      0.87   
9      TF-IDF  (1, 1)       MultinomialNB      0.84      0.83      0.82   
10     TF-IDF  (1, 1)                 MLP      0.85      0.85      0.85   
11     TF-IDF  (1, 1)  LogisticRegression      0.87      0.87      0.86   
12     TF-IDF  (2, 2)    

## Results

The evaluation of classifier performance on the UN SDG labeled textual data yielded the following insights:

1. **Best Overall Performance**: 
   - The combination of **Count Vectorizer** with **(1, 2) n-grams** and **Logistic Regression** achieved the highest scores across all metrics, with **Precision: 0.87, Recall: 0.87, F1-Score: 0.87, Accuracy: 0.87**.

2. **Top TF-IDF Performance**:
   - The best-performing configuration for TF-IDF was **(1, 1) n-grams** with **Logistic Regression**, achieving an **Accuracy of 0.87** and an **F1-Score of 0.86**.

3. **Multinomial Naive Bayes**:
   - Performed well with **Count Vectorizer (1, 1) n-grams**, achieving **Precision: 0.87, Recall: 0.87, F1-Score: 0.87, Accuracy: 0.87**.
   - Its performance declined when applied with TF-IDF or higher-order n-grams.

4. **MLP Classifier**:
   - Achieved competitive results across most configurations but did not outperform Logistic Regression or Naive Bayes in any specific setting.

5. **Impact of Preprocessing**:
   - The use of **Count Vectorizer** consistently outperformed TF-IDF for both unigram and combined unigram + bigram settings.
   - Higher-order n-grams (2, 2) showed a slight decrease in performance for all classifiers, likely due to sparsity issues.

### Conclusion
Logistic Regression with **Count Vectorizer** and **(1, 2) n-grams** emerges as the most effective configuration for classifying SDG textual data, demonstrating robust precision, recall, F1-score, and accuracy across evaluations.


### Code Explanation: Text Classification Process

This code performs text classification using a trained model and vectorizer. Here’s how it works:

1. **Define the Classification Function (`classify_text`)**:
   - Accepts a single text input.
   - Transforms the text into a feature matrix using a trained vectorizer.
   - Predicts the SDG label using a trained classifier.
   - Returns the predicted SDG label.

2. **Train the Model and Vectorizer**:
   - The `CountVectorizer` is set to use both unigrams and bigrams (`ngram_range=(1, 2)`) and a minimum document frequency (`min_df=10`) to reduce noise.
   - The `LogisticRegression` model is trained on the full training corpus to associate features with SDG labels.

3. **Classify New Text**:
   - A long block of pasted text is transformed into a feature matrix using the trained vectorizer.
   - The classifier predicts which SDG the text most likely aligns with.

4. **Output the Prediction**:
   - The predicted SDG is printed for the user to evaluate.

This process can be used to evaluate the SDG alignment of any input text based on the traine4 for life below water).


In [15]:
from sklearn.linear_model import LogisticRegression

def classify_text(text, classifier, vectorizer, ngram_range=(1, 2), min_df=10):
    """
    Classifies a single piece of text using the provided classifier and vectorizer.

    Args:
        text (str): The text to classify.
        classifier: The trained classifier.
        vectorizer: The trained vectorizer.
        ngram_range (tuple): The n-gram range for the vectorizer.
        min_df (int): The minimum document frequency for the vectorizer.

    Returns:
        prediction (int): Predicted SDG label for the text.
    """
    vectorizer.set_params(ngram_range=ngram_range, min_df=min_df)
    X_new = vectorizer.transform([text])  

    prediction = classifier.predict(X_new)[0]

    return prediction


best_vectorizer = CountVectorizer(ngram_range=(1, 2), min_df=10)
X = best_vectorizer.fit_transform(corpus)  
best_classifier = LogisticRegression(max_iter=300)
best_classifier.fit(X, labels)  

pasted_text = """

Home
About
Research
Education
Get Involved
Contact Us
en_USEnglish
DONATE
SearchSearch
Beyond Tracking
Galapagos Giant Tortoise Feeding Ecology
Having discovered some of the mechanisms governing migration and other movements, we wanted to place our new knowledge into a wider ecological and conservation context. We knew that food availability influenced tortoise migrations, but we had little data on tortoise diets. Previous studies had been conducted in the 1980s before the explosion of introduced species on Santa Cruz Island.

We spent several hundred hours observing tortoises and recording all feeding activity, noting the plant species eaten and other details such as bites per minute. We also studied how plant communities change along the elevation gradient.

Tortoises eat at least 96 different plant species. Young mature leaves are preferred, and fruit also makes up a large part of the diet.

Galapagos tortoises consume many plant species that were introduced to Galapagos by people often preferring these over native and endemic plant species. Many of these species were brought to Galapagos as food sources for people and livestock and are highly nutritious so it is not surprising that tortoises feed on them. We found that the physical condition of tortoises may even be improved when feeding on plant species introduced to Galapagos by people.

An unfortunate consequence of feeding heavily on fruits from non-native plant species introduced to Galapagos, such as guava, is that as tortoises eat the fruit, they are also ingesting the seeds they contain. We found that an average pile of tortoise poo contains several hundred seeds of the highly invasive guava tree. 

We also found that it might take two to three weeks for a seed to pass through a tortoise’s digestive tract and during this time, a migrating tortoise may travel several kilometers.  For this reason, tortoises are capable of dispersing huge numbers of seeds over large distances.  This accelerates the spread of these species, which can be highly invasive and destructive to native Galapagos plant communities.

On the other hand, tortoises are also dispersing the seeds of many native species and potentially maintaining them on the islands. Tortoises can genuinely be called the “Gardeners of the Galapagos.”

This story is written up more fully in the following journal articles:

The Dominance of Introduced Plant Species in the Diets of Migratory Galapagos Tortoises Increases with Elevation on a Human‐Occupied Island
Seed dispersal by Galápagos tortoises
Digesta retention time in the Galápagos tortoise (Chelonoidis nigra)
Plant species dispersed by Galapagos tortoises surf the wave of habitat suitability under anthropogenic climate change
Galapagos Giant Tortoises and Human Interactions
As Galapagos develops economically and the human population rises, it will be increasingly important to understand the dynamics of tortoise-human interactions.

Like migratory species all over the world, long distance migration by Galapagos tortoises means that many tortoises leave the protective security of the Galapagos National Park and enter private farmland in the highlands of Santa Cruz and other inhabited islands. This has the potential to lead to challenges both for farmers and for tortoises – several giant tortoises can destroy a field of newly planted maize, for example. On the other hand, the presence of tortoises is compatible with cattle farming.

Furthermore, the private lands of Galapagos are dedicated to different uses, from arable agriculture, to livestock production, to tourism and urban development, all of which have different implications for tortoise conservation and the relationship between tortoise and people. As Galapagos develops economically and the human population rises, it will be increasingly important to understand the dynamics of tortoise-human interactions.

Additional research started in 2017 involves in-depth studies within farmlands to better understand how tortoise movements and behavior might depend on land use and habitat fragmentation, and which strategies could be implemented together with land owners and stakeholders to solve these potential conflicts and their consequences.

We have integrated research on this issue into the Galapagos Tortoise Movement Ecology Program. First, a brief stakeholder workshop was initiated in 2018 to bring private landowners, researchers, and members of the Galapagos National Park Directorate and local institutions together to generate a constructive dialogue to share experiences and discuss strategies and solutions to potential conflicts between people and tortoises.

Attitudes toward tortoises were almost exclusively positive or benign, with landowners recognizing the importance of tortoises to the entire economy of Galapagos, and the non-trivial recognition that the “tortoises were here first.” However, if tortoises are perceived to limit rather than enhance economic opportunity, this situation could change.

As a follow-up to this work, we facilitated a study on tortoise ecology in private lands conducted by PhD student Kyana Pike. The study involved analysis of how tortoise ranging and behavior in private lands are influenced by land use type and human infrastructure such as roads, fences, and natural and artificial ponds.

Global Positioning System (GPS)-tagged tortoises spend an average of 150 days per year in private lands, and a tortoise uses an average of four farms, up to a maximum of 24 different farms. Use of multiple farms under multiple different land uses by a single tortoise indicates that finding shared solutions to tortoise conservation and conflict mitigation on Santa Cruz island will require cooperation across the agricultural zone.

Contrary to our expectations, fences in their current configuration seem to present few serious barriers to tortoise movements. Most fences are in a poor state of repair, however even fences designed to keep tortoises out of crop areas are relatively porous.

This important research is in its early days and we will be reporting on outputs in detail in the future. Initial scientific publications on this work are published in the following journal articles:

Identifying Shared Strategies and Solutions to the Human–Giant Tortoise Interactions in Santa Cruz, Galapagos: A Nominal Group Technique Application
Migration by Galapagos giant tortoises requires landscape-scale conservation efforts
 
 
Our Story
About Us
Our People
Research Projects
Publications
Make a Difference
Donate
Grant a Wish
Volunteer
Our Supporters
Languages
en_USEnglish
es_ESEspañol
Policies
Privacy Policy
Cookie Policy
© 2020 - 2024 Galapagos Tortoise Movement Ecology Program. All rights reserved.
This website uses cookies. You consent to our cookies by clicking on "OK," by closing this notice, or by continuing to use this website.OKPrivacy Policy
"""

prediction = classify_text(pasted_text, best_classifier, best_vectorizer)

print("Text classified as:")
print(f"Predicted SDG: {prediction}")


Text classified as:
Predicted SDG: 14


The output of the model is SDG 14 which is "Life Below Water". This result makes sense sense this article's theme seems to align with the assigned SDG 

In [17]:
from sklearn.linear_model import LogisticRegression

def classify_text(text, classifier, vectorizer, ngram_range=(1, 2), min_df=10):
    """
    Classifies a single piece of text using the provided classifier and vectorizer.

    Args:
        text: The text to classify.
        classifier: The trained classifier.
        vectorizer: The trained vectorizer.
        ngram_range: The n-gram range for the vectorizer.
        min_df: The minimum document frequency for the vectorizer.

    Returns:
        prediction: Predicted SDG label for the text.
    """
    vectorizer.set_params(ngram_range=ngram_range, min_df=min_df)
    X_new = vectorizer.transform([text])  

    prediction = classifier.predict(X_new)[0]

    return prediction


best_vectorizer = CountVectorizer(ngram_range=(1, 2), min_df=10)
X = best_vectorizer.fit_transform(corpus)  
best_classifier = LogisticRegression(max_iter=300)
best_classifier.fit(X, labels)  

pasted_text = """
Skip to main content
U.S. flag
An official website of the United States government

Here’s how you know
U.S. Department of Homeland Security logo

Blue Campaign. One Voice. One Mission. End Human Trafficking.
About Blue Campaign
What Is Human Trafficking?
How You Can Help
Learning Center
Events and Initiatives
Breadcrumb
Blue Campaign  What Is Human Trafficking?
Blue Campaign
About Blue Campaign
What Is Human Trafficking?
What is Forced Labor?
Indicators
Myths and Misconceptions
Identify a Victim
How You Can Help
Learning Center
Events and Initiatives
Subscribe
Subscribe to the Blue Campaign newsletter
What Is Human Trafficking?
Human trafficking involves the use of force, fraud, or coercion to obtain some type of labor or commercial sex act. Every year, millions of men, women, and children are trafficked worldwide – including right here in the United States. It can happen in any community and victims can be any age, race, gender, or nationality. Traffickers might use the following methods to lure victims into trafficking situations:

Violence
Manipulation
False promises of well-paying jobs
Romantic relationships 
Language barriers, fear of their traffickers, and/or fear of law enforcement frequently keep victims from seeking help, making human trafficking a hidden crime.

Traffickers look for people who are easy targets for a variety of reasons, including:

Psychological or emotional vulnerability
Economic hardship
Lack of a social safety net
Natural disasters
Political instability
The trauma caused by the traffickers can be so great that many may not identify themselves as victims or ask for help, even in highly public settings.

Many myths and misconceptions exist. Recognizing key indicators of human trafficking is the first step in identifying victims and can help save a life. Not all indicators listed are present in every human trafficking situation, and the presence or absence of any of the indicators is not necessarily proof of human trafficking.

The safety of the public as well as the victim is important. Do not attempt to confront a suspected trafficker directly or alert a victim to any suspicions. It is up to law enforcement to investigate suspected cases of human trafficking.

Visit the links below to learn more about human trafficking and how you can protect yourself and others.

Forced Labor
Image
Forced Labor Icon
Identify a Victim
Image
Identify a Victim Icon
Myths and Misconceptions
Image
Myths and Misconceptions Icon
Exploitation and How to Protect Yourself
Image
Exploitation and How to Protect Yourself Icon
Explotación y Cómo Protegerse A Sí Mismo
Image
Exploitation and How to Protect Yourself Icon
What is Human Trafficking? Infographic
Image
What is Human Trafficking Infograpghic 
¿Qué es la Trata de Personas?
Image
¿Qué es la Trata de Personas? Icon
Topics
Human Trafficking
Keywords
Blue Campaign Human Trafficking

To report suspected human trafficking to Federal law enforcement:
Image
To report suspected human trafficking, call 1-866-347-2423
1-866-347-2423
Para reportar un posible caso de trata de personas:
Image
Para reportar un posible caso de trata de personas: 1-866-347-2423
1-866-347-2423
To get help from the National Human Trafficking Hotline:
Image
To get help from the National Human Trafficking Hotline, call 1-888-373-7888
1-888-373-7888
or text HELP or INFO to BeFree (233733)

Obtenga ayuda de la Línea Directa Nacional de Trata de Personas:
Image
Obtenga ayuda de la Línea Directa Nacional de Trata de Personas: 1-888-373-7888
1-888-373-7888
o enviando un mensaje de texto con HELP o INFO a BeFree (233733)

Last Updated: 09/22/2022

Was this page helpful?
Yes No
Return to top
About Blue Campaign
What Is Human Trafficking?
How You Can Help
Learning Center
Events and Initiatives
Blue Campaign. One Voice. One Mission. End Human Trafficking.
Facebook
X
Instagram
Email
Contact Blue Campaign
Report Suspected Human Trafficking: 1-866-347-2423
Get Help from the National Human Trafficking Hotline: 1-888-373-7888
U.S. Department of Homeland Security Seal
DHS.gov/Blue-Campaign

An official website of the U.S. Department of Homeland Security

About Blue Campaign
Accessibility
FOIA Requests
Privacy Policy
DHS.gov


"""

prediction = classify_text(pasted_text, best_classifier, best_vectorizer)

print("Text classified as:")
print(f"Predicted SDG: {prediction}")


Text classified as:
Predicted SDG: 14


### Explanation for Misclassification (Predicted SDG: 14)

The text about human trafficking was misclassified as SDG 14 (Life Below Wate which is clearly not correct. Some of the reasons for this could be because:s:

1. **Training Data Imbalance**:
   - If the training dataset contains more examples labeled as SDG 14, the model may lean toward predicting SDG 14, especially for ambiguous text.

2. **Feature Overlap**:
   - Keywords like "trafficking" might appear in SDG 14 training data (e.g., related to trafficking of aquatic .nfusion.

3. **Preprocessing Limitations**:
   - The n-gram range `(1, 2)` and `min_df=10` might not capture critical phrases specific to human trafficking, focusing instead on more gennd reduce misclassifications.


In [19]:
from sklearn.linear_model import LogisticRegression

def classify_text(text, classifier, vectorizer, ngram_range=(1, 2), min_df=10):
    """
    Classifies a single piece of text using the provided classifier and vectorizer.

    Args:
        text: The text to classify.
        classifier: The trained classifier.
        vectorizer: The trained vectorizer.
        ngram_range: The n-gram range for the vectorizer.
        min_df: The minimum document frequency for the vectorizer.

    Returns:
        prediction: Predicted SDG label for the text.
    """
    vectorizer.set_params(ngram_range=ngram_range, min_df=min_df)
    X_new = vectorizer.transform([text])  

    prediction = classifier.predict(X_new)[0]

    return prediction


best_vectorizer = CountVectorizer(ngram_range=(1, 2), min_df=10)
X = best_vectorizer.fit_transform(corpus)  
best_classifier = LogisticRegression(max_iter=300)
best_classifier.fit(X, labels)  

pasted_text = """
Skip to main content
U.S. flag
An official website of the United States government.

Here’s how you know 
United States Department of Labor
U.S. Department of Labor
Office of Disability Employment Policy
Babel Notice FAQ Contact Us
Search
Search ODEP

submenu
PROGRAM AREAS
STATE POLICY
RESEARCH AND EVALUATION
INITIATIVES
NEWS AND PUBLICATIONS
NDEAM
ADA
ABOUT
Breadcrumb
ODEP
Program Areas
Individuals
Older Workers
Older Workers
Financial Education and Incentives
Older Workers
Veterans
Women
Youth
Today, a confluence of factors is prompting America to change the way it thinks about age and work. The economic downturn, shifting perceptions of retirement, increased workplace flexibility, and the aging of the "baby boom" generation are all contributing to people working longer. Many of these capable, experienced mature workers develop disabilities as they age, or existing disabilities may become more significant. To retain the talents of these valuable, skilled workers, employers can implement a variety of workplace practices, many of which benefit all workers and make good business sense. The following resources provide more information about the topic of older workers:

Reports from ODEP's NTAR Leadership Center
National Technical Assistance and Research Center to Promote Leadership for Increasing the Employment and Economic Independence of Adults with Disabilities (NTAR Leadership Center) issued these reports that examine the disability implications of an aging workforce.

Community College Briefs
Postsecondary education is increasingly important for older job seekers' reemployment. Yet, they may face potential challenges in accessing and completing education and training due to their greater likelihood of having acquired age-related disabilities. The following three briefs provide new research data and findings on older students and dislocated workers researched by the NTAR Leadership Center. Existing data, such as that from the Integrated Postsecondary Education Data System, does not track the numbers of dislocated workers enrolled at community colleges; and nearly three-quarters of community colleges reported very few students with disabilities enrolled (less than three percent of their student population). Moreover, among older students, unidentified disabilities are not documented, which provides challenges for community colleges to document their statistics on older students with disabilities.

Community College Practices that Serve Older Dislocated Workers (PDF) — This brief highlights strategies and findings at five community colleges serving high numbers of dislocated workers and examines how those practices meet the needs of older workers, some of whom may be aging with or into disabilities.
How Are Community Colleges Serving the Needs of Older Students with Disabilities? (PDF) — To examine the issues related to older students with disabilities, this brief documents the research conducted to learn how colleges—in particular, community colleges—can better support the education and training needs of these students.
Working for Adults: State Policies and Community College Practices to Better Serve Adult Learners at Community Colleges During the Great Recession and Beyond (PDF) — This report synthesizes knowledge about how community colleges serve adults. The first section provides background and context on adults at community colleges, while the second section details the methodology used in this research. Other sections describe the findings on the enrollment of adults at community colleges, recent initiatives that have sought to support adults at community colleges, the state policy and college practices related to adults' enrollment, and research on student outcomes and the implications for what is known about state policy and college practice. The final section highlights recommendations for policymakers and practitioners interested in serving adults at community colleges, including those with disabilities.
ODEP and other DOL agency resources
Stay at Work/Return to Work Programs — A resource from the Employer Assistance and Resource Network on Disability Inclusion (EARN) that shows the advantages of stay-at-work and return-to-work (SAW/RTW) programs, discusses successful SAW/RTW retention strategies, and helps employers decide which SAW/RTW strategies will work in their organizations.
Job Accommodation Network guidance on accommodations for employees who are aging.
Accommodation and Compliance Series: Employees who are Aging
Our Aging Workforce: A Look at the Benefits of Job Accommodation — JAN's Consultants' Corner
Senior Community Service Employment Program (SCSEP) — ETA-funded community service and work-based program that provides subsidized training for low-income persons 55 or older who are unemployed and have poor employment prospects.
Age Discrimination in Employment Act of 1967
Age Discrimination in Employment Act of 1975
DOL Employment & Training Administration's Older Worker Initiative — The aging and retirement of the baby boom generation will have impacts on many aspects of our society, including possible labor and skill shortages. This initiative looks at ways to encourage older employees to continue working.
Retaining Older Workers — Information from EARN on strategies to retain the talents of older workers, who may develop disabilities as they age, and how to attract new, older workers.
Other resources
Making Work More Flexible: Opportunities and Evidence (PDF) — This report considers the availability, utilization, and demand for workplace flexibility, with a particular emphasis on older workers. Although many aspects of flexibility can benefit workers of any age, the desire of some older workers to phase into retirement introduces some special considerations.
Phased Retirement and Flexible Retirement Arrangements: Strategies for Retaining Skilled Workers (PDF) — Implementing appealing work arrangements that attract and retain workers 50+ may become increasingly important in an organization's bid to survive in today's marketplace. Phased retirement, which allows the employee to reduce work time in his or her current job, is regarded as one strategy to encourage hard-to-replace, experienced workers to postpone leaving the labor force. This report discusses the factors influencing the business need for phased retirement, how to create a phased retirement program, how to market a phased retirement program to employees, challenges in implementing phased retirement, proposed regulatory solutions, and cutting-edge employee programs.
Protecting Family Caregivers from Employment Discrimination (AARP Public Policy Institute) (PDF) — This report is the first in a series of AARP Public Policy Institute papers on issues of eldercare and the workplace. It highlights the realities of changing demographics and issues affecting working caregivers of older adults. It defines family responsibilities discrimination (FRD), explains why FRD is a policy matter, and describes the types of workplace discrimination encountered by working caregivers.
Highlights of a GAO Forum: Engaging and Retaining Older Workers (PDF)
Older Workers: Some Best Practices and Strategies for Engaging and Retaining Older Workers (PDF)
Building Your Career After 50 — AARP resources to assist 50 and older workers looking to switch careers or stay in their profession.
Older Workers: An exploration of the Benefits, Barriers, and Adaptations for Older People in the Workforce — A study from the National Institutes of Health which looks at the experiences and perceptions of paid workers aged 60 years and older. The study explains why older people continue to work and the barriers and facilitators they encounter.
Program Areas
State Policy
Research and Evaluation
Initiatives
News and Publications
NDEAM
ADA
About
United States Department of Labor
Office of Disability Employment Policy
An agency within the U.S. Department of Labor

200 Constitution Ave NW
Washington, DC 20210
1-866-4-USA-DOL

1-866-487-2365
Federal Government
White House
Coronavirus Resources
Disaster Recovery Assistance
DisasterAssistance.gov
USA.gov
Notification of EEO Violations
No Fear Act Data
U.S. Office of Special Counsel
Labor Department
About DOL
Guidance Search
Español
Office of Inspector General
Subscribe to the DOL Newsletter
Read the DOL Newsletter
Emergency Accountability Status Link
A to Z Index
About The Site
Freedom of Information Act
Privacy & Security Statement
Disclaimers
Important Website Notices
Plug-Ins Used on DOL.gov
Accessibility Statement

Submit Feedback

"""

prediction = classify_text(pasted_text, best_classifier, best_vectorizer)

print("Text classified as:")
print(f"Predicted SDG: {prediction}")


Text classified as:
Predicted SDG: 8


The output of the model is SDG 8 which is "Decent Work and Economic Growth". This result makes sense sense this article's theme seems to align with the assigned SDG 

In [22]:
from sklearn.linear_model import LogisticRegression

def classify_text(text, classifier, vectorizer, ngram_range=(1, 2), min_df=10):
    """
    Classifies a single piece of text using the provided classifier and vectorizer.

    Args:
        text: The text to classify.
        classifier: The trained classifier.
        vectorizer: The trained vectorizer.
        ngram_range: The n-gram range for the vectorizer.
        min_df: The minimum document frequency for the vectorizer.

    Returns:
        prediction: Predicted SDG label for the text.
    """
    vectorizer.set_params(ngram_range=ngram_range, min_df=min_df)
    X_new = vectorizer.transform([text])  

    prediction = classifier.predict(X_new)[0]

    return prediction


best_vectorizer = CountVectorizer(ngram_range=(1, 2), min_df=10)
X = best_vectorizer.fit_transform(corpus)  
best_classifier = LogisticRegression(max_iter=300)
best_classifier.fit(X, labels)  

pasted_text = """
Skip to content
Home
Topics
Columns
Galleries
Alumni Books
Alumni Memories
Alumni Notes
Podcast
Michigan Today
Search for:
Search
Search
Subscribe
X (Twitter)
 
RSS
 update your information
Office of the VP for Communications – Keeping alumni and friends connected to U-M

Positively breaking the age code
Date
August 26, 2022
Written By
Claudia Capos
Breaking the age code art
Age is just a number
Becca Levy, BA '87
Scientist/author Becca Levy, BA ’87, is a professor of both epidemiology and psychology at Yale. (Image courtesy of Levy.)

American author and activist Betty Friedan once observed: “Aging is not ‘lost youth’ but a new stage of opportunity and strength.”

Those who share her optimism may find the pathway to longevity.

Scientist Becca Levy, BA ’87, a leading expert on the psychology of successful aging, says taking an upbeat attitude toward aging can not only improve your physical and mental health as you grow older ― but also may add nearly eight years to your lifespan.

She explains how our positive and negative age beliefs shape our behaviors, health, and, ultimately, our longevity in her new book Breaking the Age Code (William Morrow; April 12, 2022).

Levy also reveals that some health issues commonly associated with old age ― hearing loss, high blood pressure, and cardiovascular disease ― are the products of negative stereotypes and prejudices absorbed from our social surroundings. All too often, these fatalistic attitudes about the inevitability of declining health in later life become self-fulfilling prophecies.

“Age beliefs impact our health in ways big and small,” says Levy, who became interested in psychology as a U-M undergraduate. She completed her graduate work at Harvard, and is now a professor of epidemiology and of psychology at Yale.

“People who have more-positive beliefs tend to show benefits in health outcomes and healing compared to those who hold more-negative ones,” she says.

Surprise!
Over the past 20 years, Levy has conducted groundbreaking studies on different health conditions affected by attitudes toward aging. Her results show some surprising results:

Cognition ― People who stress positive age beliefs enjoy better memory performance.
Physical health ― Patients with favorable attitudes about aging are more likely to recover from severe disability.
Mental health ― Individuals who see aging as a positive experience have lower stress levels.
Longevity ― Younger people who adopt a positive outlook on aging live an average of 7.5 years longer.
While Levy’s research underscores the value of celebrating our advancing years as a time for creativity, exploration, and accomplishment, today’s reality is often quite different. All too frequently, personal views, cultural stereotypes, and institutional biases about aging are tilted in a negative direction.

In American society, old age is presented as something to be feared and avoided. Aging individuals are portrayed as fragile, forgetful, and a burden on society. The elderly are marginalized, ignored, and “put out to pasture.”

Levy describes this pervasive ageism in the U.S. and other countries as the “Silent Epidemic” because it operates, undetected, in so many different realms ― social media, advertising, pop culture, Hollywood, and health care.

Ageism as big business
Try the ABC Method to Bolster Positive Age Beliefs
Awareness:

Jot down five words or phrases that come to mind when you think of an older person.
Create a portfolio of positive older role models you admire.
Notice age beliefs in the media.
Think about ways to increase your intergenerational contacts and interactions.
Blame-shifting:

Find the real cause of unpleasant events or challenges, such as momentary forgetfulness.
Name the company or institution that benefits from negative age stereotypes.
Identify ageism when older workers are targeted.
Challenge:

Present accurate information to debunk negative age stereotypes.
Get involved in politics.
Confront ageism and negative age images in print, television, social media, and advertising.
During America’s early history, views of aging were generally positive. However, in the mid-1800s, this positivity began to wane, giving way to less-favorable age beliefs that have taken hold over time.

“The increase in negativity is due in part to the rise of advertisement and the growth of the antiaging industry,” says Levy, who has studied American age beliefs in written language spanning the past two centuries. “Companies have made a lot of money promoting negative images of aging as a way to sell their anti-aging products.”

The research firm Statista projects the global anti-aging market will generate more than $67 billion in 2022 by peddling pills, creams, tinctures, elixirs, hormonal supplements, and procedures that falsely claim to halt or even reverse aging.

The “medical disability” complex
Another source of ageism has been the increased “medicalization” of aging, Levy says.

“Advertisements typically present images of older people as patients and recipients of medical care,” she says. “These ads are not balanced by images showing the heterogeneity of older adults who come from diverse backgrounds and are engaged in different activities, such as work, volunteering, sports, and recreation.”

Television, movies, and social media have amplified misbeliefs about aging. Ageism has become “click bait” on Facebook, Twitter, Instagram, and YouTube. Agelining has relegated elderly residents to age-segregated senior housing where they are, in effect, “quarantined” from society.

Costly mistake
Ageism in the workplace has cost many senior workers with years of valuable experience in their jobs, livelihoods, and feelings of self-worth. Two-thirds of workers in America said they have witnessed or personally experienced age discrimination in their place of work, according to an AARP survey.

Western medicine relies heavily on negative age stereotypes, with their narrative of inevitable decline, because it’s profitable, according to Levy.

Levy teamed up with an economist and statistician to put a price tag on the health costs resulting from ageism. They found it totaled $63 billion per year in the U.S., more than the cost of morbid obesity.“The multibillion-dollar ‘medical disability complex’ is based on expensive procedures, devices, and pharmaceutical drugs, which are more profitable than prevention efforts,” she says. “When ageism is ignored, doctors are apt to dismiss treatable conditions, such as back pain or depression, as standard features of old age.”
This view of aging as a pathology can result in the undertreatment of elderly patients, she says.

Levy teamed up with an economist and statistician to put a price tag on the health costs resulting from ageism. They found it totaled $63 billion per year in the U.S.,  more than the cost of morbid obesity, one of America’s most expensive chronic conditions.

“The World Health Organization has called ageism the most prevalent and socially acceptable form of prejudice and discrimination today,” says Levy, who has testified before the U.S. Senate on the adverse effects of ageism. She also has contributed to briefs submitted to the U.S. Supreme Court in age-discrimination cases and has participated in United Nations discussions of ageism.

Dislodging the stereotypes
Despite the entrenched negative age stereotypes that permeate American society, Levy is convinced they can be dislodged and replaced by more-favorable views of aging and older people.

“The most important takeaway from my research is that we know age beliefs are malleable,” Levy says. “We can increase our awareness of them, challenge some negative ones, and strengthen some positive ones.”

She demonstrated in one lab study that exposing older participants to subliminal positive messages about aging improved their physical function, including walking and balance. Another study showed that positive age beliefs lowered stress and helped at-risk people ward off the symptoms of Alzheimer’s disease.

Igniting an age liberation movement
Breaking the age code bookcover
(William Morrow, April 2022.)

The time has come to shift from an age-declining to an age-thriving mindset, according to Levy, who presents a blueprint for overcoming structural ageism in her book.

“I think we are getting closer to a tipping point where an age liberation movement will take hold,” she says. “Growing numbers of people are becoming aware and angry about ageism.”

National organizations, such as the American Psychological Association, the Gerontological Society of America, and HelpAge International, have begun issuing urgent warnings about the hazards of ageism. The Gray Panthers in New York City continue to confront age stereotypes and discrimination head-on. In addition, the World Health Organization recently launched its Campaign to Combat Ageism, which 194 countries have endorsed. Levy is serving as a scientific adviser to the campaign.

However, until the groundswell of opposition to structural ageism in our social and policy institutions gains traction, ordinary people will need tools to navigate, question, and challenge negative stereotypes and attitudes on aging.

Levy has developed an “ABC Method” that individuals can use to harness the power of their own positive age beliefs to improve their health. This approach consists of three stages:

Increasing Awareness of negative age beliefs within and around us
Placing Blame on ageism and its societal sources
Challenging negative age beliefs
“As individuals acquire a greater sense of their value as older persons, they are more likely to participate in an age liberation movement,” Levy says. “The movement, in turn, is bound to further increase their sense of value as older persons. This cultural redefinition will contribute to a virtuous cycle.”

Comments
Thomas Cislo - 1972, 1975
Excellent book review and an important topic! Thank you.

Reply

Cliff Douglas - 1980, 1983
This is terrific work and a very important contribution to the public health and welfare of many millions of people. If of interest, I’ve tweeted it out at https://twitter.com/cdoug/status/1563525118562775044?s=21&t=kc0hIo9p9oXKNjYd3eIjJw to let more people know about it. Thanks!

Reply

Thomas F. Higby - 1958 M.D.
I’m frequently asked “How are you doing?”, my reply is: “Never having been at this stage of life before, I don’t know what to expect”. Actually, I think it’s “pretty well”.
I’ll be 90 in December. Today I have been mowing my grass; walk behind mower that I call my self propelled walker. I will have a practice session on my horn; I play in The Livingston County Concert Band. I’m regularly dating with a near 90 woman friend. I’m in contact with medical and H.S class-mates ( a rapidly disappearing bunch) It sounds like a good life. I do not expect anything beyond the grave.
Tom Higby

Reply

Paul Krueger - Years since graduation? 90 years old, 2 degrees UM, 1 Harvard
All sounds too negative.
Take a more positive approach.

Reply

Richard DeVries - 1973
Good article. It’s interesting, however, that the University of Michigan itself is very quick to treat patients differentially based solely on age. Patients over a certain age are quickly transported out of UM to a geriatric floor of St Joe’s with what I have observed as a lower level of care. The attitude of staff, especially younger staff, can be very ageist. I hope individuals in the UM system review this article.

Reply

Kerstin Prof.Dr. Schaper-Lang - 1976 and 1991
Thanks for this wonderful article. It gives hope to all of us! Yes, rethink age: we are the most expirienced and valuable workforce, what an enormous value for any society. Everybody has different genes, so overcome stereotypes.

Reply

Leave a comment:
First Name (required):
Last Name (required):
Email (required, will not be published):
Year(s) of graduation:
Comment:


About the Author
Claudia Capos headshot
CLAUDIA R. CAPOS has more than 25 years of experience in journalism, publications, and media relations. A graduate of the University of Michigan’s School of Journalism, she co-founded and co-published About Ann Arbor Magazine for five years before joining The Detroit News where, over a 10-year period, she held a variety of writing and editing positions and received three Pulitzer Prize nominations. Since leaving The News, Capos has provided communications services to clients ranging from the Big Three automotive manufacturers to small and mid-size companies. She's published articles on business, real estate, and travel in national and regional magazines, as well as major metropolitan newspapers.
Social
Share on: Share on X (Twitter) Share on Facebook
Topics
Arts & Culture
Athletics
Business and Economy
Campus Life
Education & Society
Environment
Heritage/Tradition
Innovation
International
Law & Politics
Philanthropy
Podcast: "Listen in, Michigan"
Research News
Science and Technology
University of Michigan
U-M Privacy Policy
About
Contact
Giving to U-M
Publications & Resources
U-M Alumni Association
Update Your Information
Unsubscribe
Podcast: “Listen in, Michigan”
Office of the VP for Communications
© 2024 The Regents of the University of Michigan
"""

prediction = classify_text(pasted_text, best_classifier, best_vectorizer)

print("Text classified as:")
print(f"Predicted SDG: {prediction}")


Text classified as:
Predicted SDG: 8


### Explanation for Misclassification (Predicted SDG: 8)

This model predicted SDG 8 which is "Decent Work and Economic Growth". Although the pasted article does have some mention of these things, SDG 10 (Reduced Inequalities) or SDG 3 (Good Health and Well-being) would have been better matches. Some of the reasons for this misclassification could have been becasue of:

1. **Vocabulary Mismatch**:
   - The input text might use terms or phrases not strongly represented in the training data for SDG 8.
   - For example, phrases like "ageism" or "positive aging" might not have been present in the SDG 8 examples during training.

2. **Ambiguity in Content**:
   - The text discusses broader societal topics (e.g., aging, public health, and discrimination) that could overlap with other SDGs, such as SDG 10 (Reduced Inequalities) or SDG 3 (Good Health and Well-being).

3. **Training Data Imbalance**:
   - If the training data includes a disproportionately small number of SDG 8 examples, the classifier might not accurately capture the nuances of thisra4 Networks.

5. **N-Gram and Frequency Thresholds**:
   - The current configuration uses n-grams `(1, 2)` and a `min_df=10`. This setup might filter out specific terms or combinations that are unique to SDG 8, reducing the model's ability to d5stinguish it.

6. **Human Bias in Text**:
   - The text focuses on societal and psychological aspects rather than direct economic indicators (e.g., employment, labor policies), which are typically associated with SDG 8 (Decent Work and  well to diverse content.
