<center>
<img src="https://laelgelcpublic.s3.sa-east-1.amazonaws.com/lael_50_years_narrow_white.png.no_years.400px_96dpi.png" width="300" alt="LAEL 50 years logo">
<h3>APPLIED LINGUISTICS GRADUATE PROGRAMME (LAEL)</h3>
</center>
<hr>

# Corpus Linguistics - Study 2 - Phase 3

This phase aims to compile a [Greenpeace Stories](https://www.greenpeace.org/international/story/) corpus for CAD 2026 and IVACS 2026.

## Required Python packages

- pandas

## Import the required libraries

In [1]:
import pandas as pd
import os
import sys
import logging

## Define input variables

In [2]:
input_directory = 'cl_st2_ph1_arianne'
output_directory = 'cl_st2_ph2_arianne'

## Create output directory

In [3]:
# Check if the output directory already exists. If it does, do nothing. If it doesn't exist, create it.
if os.path.exists(output_directory):
    print('Output directory already exists.')
else:
    try:
        os.makedirs(output_directory)
        print('Output directory successfully created.')
    except OSError as e:
        print('Failed to create the directory:', e)
        sys.exit(1)

Output directory already exists.


## Set up logging

In [4]:
log_filename = f"{output_directory}/{output_directory}.log"

In [5]:
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    filename=log_filename
)

## Functions

### Create output subdirectories

In [6]:
def create_directory(path):
    """Creates a subdirectory if it doesn't exist."""
    if not os.path.exists(path):
        try:
            os.makedirs(path)
            print(f"Successfully created the directory: {path}")
        except OSError as e:
            print(f"Failed to create the {path} directory: {e}")
            sys.exit(1)
    else:
        print(f"Directory already exists: {path}")

## Data Wrangling [Greenpeace Stories](https://www.greenpeace.org/international/story/)

### Define local variables

In [7]:
id = 'grp'
extracted_dir = '02_extracted'
path = os.path.join(output_directory, id, extracted_dir)
dataset_filename_1 = f"{id}_list"
dataset_filename_2 = f"{id}"
dataset_filename_3 = f"{id}_1000"

### Create output subdirectory

In [8]:
create_directory(path)

Directory already exists: cl_st2_ph2_arianne/grp/02_extracted


### Import the data into a DataFrame

In [9]:
df_grp_paragraph = pd.read_json(f"{input_directory}/{dataset_filename_2}.jsonl", lines=True)

In [10]:
df_grp_paragraph['Post Date'] = pd.to_datetime(df_grp_paragraph['Post Date'], unit='ms')

In [11]:
df_grp_paragraph

Unnamed: 0,Source,Post Term,Post Tags,Title,Post URL,Authors,Post Date,Post ID,Category,Paragraph,Text Paragraph
0,Greenpeace,Stories,Photography,Greenpeace Pictures of the Week,https://www.greenpeace.org/international/story...,Greenpeace International,2025-08-15 01:45:33,grp000000,Greenpeace,Paragraph 1,From a banner protest at the plastic treaty in...
1,Greenpeace,Stories,Photography,Greenpeace Pictures of the Week,https://www.greenpeace.org/international/story...,Greenpeace International,2025-08-15 01:45:33,grp000000,Greenpeace,Paragraph 2,üá¨üáß England ‚Äì Greenpeace UK‚Äôs climbers install ...
2,Greenpeace,Stories,Photography,Greenpeace Pictures of the Week,https://www.greenpeace.org/international/story...,Greenpeace International,2025-08-15 01:45:33,grp000000,Greenpeace,Paragraph 3,After securing a giant 12m x 8m canvas to one ...
3,Greenpeace,Stories,Photography,Greenpeace Pictures of the Week,https://www.greenpeace.org/international/story...,Greenpeace International,2025-08-15 01:45:33,grp000000,Greenpeace,Paragraph 4,The work starkly visualises the wound inflicte...
4,Greenpeace,Stories,Photography,Greenpeace Pictures of the Week,https://www.greenpeace.org/international/story...,Greenpeace International,2025-08-15 01:45:33,grp000000,Greenpeace,Paragraph 5,"üá®üá≠ Switzerland ‚Äì Juan Carlos Monterrey G√≥mez, ..."
...,...,...,...,...,...,...,...,...,...,...,...
23415,Greenpeace,Stories,"AboutUs, 50Years",Bob Hunter 1941 ‚Äì 2005,https://www.greenpeace.org/international/story...,Greenpeace International,2005-05-02 15:15:00,grp001356,Greenpeace,Paragraph 13,He joined Toronto‚Äôs City TV as an ecology spec...
23416,Greenpeace,Stories,"AboutUs, 50Years",Bob Hunter 1941 ‚Äì 2005,https://www.greenpeace.org/international/story...,Greenpeace International,2005-05-02 15:15:00,grp001356,Greenpeace,Paragraph 14,Over the years he continued to contribute to G...
23417,Greenpeace,Stories,"AboutUs, 50Years",Bob Hunter 1941 ‚Äì 2005,https://www.greenpeace.org/international/story...,Greenpeace International,2005-05-02 15:15:00,grp001356,Greenpeace,Paragraph 15,"In a recent book, Rex Weyler writes about refl..."
23418,Greenpeace,Stories,"AboutUs, 50Years",Bob Hunter 1941 ‚Äì 2005,https://www.greenpeace.org/international/story...,Greenpeace International,2005-05-02 15:15:00,grp001356,Greenpeace,Paragraph 16,‚ÄúThe ironies and tension of history simultaneo...


### Reassemble the blog posts from the paragraphs

To address your request, I will group the `df_grp_paragraph` DataFrame by `Post ID`, sort the paragraphs within each group based on the `Paragraph` identifier to ensure correct order, and then join the text content from `Text Paragraph` with newline characters. Finally, I'll merge this aggregated text back with the original metadata columns while dropping the individual paragraph-related columns.



In [12]:
# Sort by Post ID and Paragraph to ensure correct order
df_grp_paragraph_sorted = df_grp_paragraph.sort_values(by=['Post ID', 'Paragraph'])

# Group by Post ID and join the Text Paragraphs with a newline
df_grp_texts = df_grp_paragraph_sorted.groupby('Post ID')['Text Paragraph'].apply(lambda x: '\n'.join(x)).reset_index()
df_grp_texts.rename(columns={'Text Paragraph': 'Text'}, inplace=True)

# Merge the reassembled text back with the original metadata (dropping paragraph specific columns first to avoid duplicates)
# We keep the first occurrence of metadata for each Post ID
df_grp_metadata = df_grp_paragraph_sorted.drop(columns=['Paragraph', 'Text Paragraph']).drop_duplicates(
    subset=['Post ID'])

# Create the final df_grp DataFrame
df_grp = pd.merge(df_grp_metadata, df_grp_texts, on='Post ID')

df_grp


Unnamed: 0,Source,Post Term,Post Tags,Title,Post URL,Authors,Post Date,Post ID,Category,Text
0,Greenpeace,Stories,Photography,Greenpeace Pictures of the Week,https://www.greenpeace.org/international/story...,Greenpeace International,2025-08-15 01:45:33,grp000000,Greenpeace,From a banner protest at the plastic treaty in...
1,Greenpeace,Stories,Forests,Environmental storytelling for a Chinese audie...,https://www.greenpeace.org/international/story...,August Rick,2025-08-14 01:40:25,grp000001,Nature,"Liu Min, better known as ‚ÄúAunt Bear‚Äù, was one ..."
2,Greenpeace,Stories,AlternativeFutures,5 reasons Greenpeace calls for new global tax ...,https://www.greenpeace.org/international/story...,Nina Stros,2025-08-13 13:47:04,grp000002,Social and Economic Systems,Smoke from the Canadian wildfires drifting thr...
3,Greenpeace,Stories,Photography,Greenpeace Pictures of the Week,https://www.greenpeace.org/international/story...,Greenpeace International,2025-08-08 05:09:19,grp000003,Greenpeace,From a demonstration at the plastic treaty in ...
4,Greenpeace,Stories,"Peace, Nuclear",From Hiroshima to Gaza: defending peace,https://www.greenpeace.org/international/story...,Greenpeace France,2025-08-07 15:22:58,grp000004,Energy,"On August 6 and 9, 1945, two atomic bombs pulv..."
...,...,...,...,...,...,...,...,...,...,...
1352,Greenpeace,Stories,"AboutUs, 50Years",Dorothy Stowe 1920 ‚Äì 2010,https://www.greenpeace.org/international/story...,Rex Weyler,2010-07-23 07:48:13,grp001352,Greenpeace,Greenpeace co-founder Dorothy Stowe passed awa...
1353,Greenpeace,Stories,"AboutUs, 50Years",Jim Bohlen 1926 ‚Äì 2010,https://www.greenpeace.org/international/story...,Greenpeace International,2010-07-06 08:06:09,grp001353,Greenpeace,There‚Äôs an old joke that you can walk into any...
1354,Greenpeace,Stories,"AboutUs, 50Years",A chat with the first Rainbow Warriors,https://www.greenpeace.org/international/story...,Michael Friedrich,2007-06-24 12:03:00,grp001354,Greenpeace,"In Vancouver, on Canada‚Äôs Pacific coast, Green..."
1355,Greenpeace,Stories,"AboutUs, 50Years",Amchitka: the founding voyage,https://www.greenpeace.org/international/story...,Greenpeace International,2007-05-14 13:56:16,grp001355,Greenpeace,"In 1971, a small group of activists set sail t..."


### Identifying potential fragments

The user wants to filter the `df_grp` DataFrame to find rows where the `Text` column contains only one paragraph. Since the text was constructed by joining paragraphs with a newline character (`\n`), a single paragraph is characterized by the absence of newline characters in the `Text` column.

In [13]:
# Identify rows where the 'Text' column consists of a single paragraph
# Since paragraphs were joined by '\n', a single paragraph implies no newline characters are present
df_grp_single_paragraph = df_grp[df_grp['Text'].str.count('\n') == 0]
df_grp_single_paragraph

Unnamed: 0,Source,Post Term,Post Tags,Title,Post URL,Authors,Post Date,Post ID,Category,Text
118,Greenpeace,Stories,"ArcticSunrise, RainbowWarrior, Witness, AboutUs",Best of Greenpeace ships 2024,https://www.greenpeace.org/international/story...,Sudhanshu Malhotra,2024-12-18 07:55:26,grp000118,Greenpeace,Our ships are a source of strength for million...
962,Greenpeace,Stories,Climate,"Amidst this climate crisis, I found hope from ...",https://www.greenpeace.org/international/story...,Nanticha Ocharoenchai,2019-05-01 02:36:13,grp000962,Social and Economic Systems,All it took was determination and initiative
1031,Greenpeace,Stories,Climate,A super-charged typhoon took my family away. I...,https://www.greenpeace.org/international/story...,Rashini Suriyaarachchi,2018-10-09 05:06:43,grp001031,Social and Economic Systems,Many of us think about the impacts of climate ...
1210,Greenpeace,Stories,"Consumption, Food, Health","Seeing is believing: Growing food for people, ...",https://www.greenpeace.org/international/story...,Reyes Tirado,2017-01-13 00:31:00,grp001210,Nature,‚ÄúOjos hacen fe.‚Äù Those are the words of Lucy M...


Only row 118 is not a fragment and should not be excluded.

The user wants to remove the row with index `118` from the `df_grp_single_paragraph` DataFrame. This DataFrame currently holds rows identified as single-paragraph posts (potential fragments), but according to the previous markdown cell, row 118 is valid and should be kept in the main dataset (meaning it should be removed from this "to-delete" list).



In [14]:
# Drop row 118 from df_grp_single_paragraph
df_grp_single_paragraph = df_grp_single_paragraph.drop(118)
df_grp_single_paragraph

Unnamed: 0,Source,Post Term,Post Tags,Title,Post URL,Authors,Post Date,Post ID,Category,Text
962,Greenpeace,Stories,Climate,"Amidst this climate crisis, I found hope from ...",https://www.greenpeace.org/international/story...,Nanticha Ocharoenchai,2019-05-01 02:36:13,grp000962,Social and Economic Systems,All it took was determination and initiative
1031,Greenpeace,Stories,Climate,A super-charged typhoon took my family away. I...,https://www.greenpeace.org/international/story...,Rashini Suriyaarachchi,2018-10-09 05:06:43,grp001031,Social and Economic Systems,Many of us think about the impacts of climate ...
1210,Greenpeace,Stories,"Consumption, Food, Health","Seeing is believing: Growing food for people, ...",https://www.greenpeace.org/international/story...,Reyes Tirado,2017-01-13 00:31:00,grp001210,Nature,‚ÄúOjos hacen fe.‚Äù Those are the words of Lucy M...


To address your request, I will update the `df_grp` DataFrame by removing the rows that are present in the `df_grp_single_paragraph` DataFrame, effectively filtering out posts consisting of only a single paragraph.

In [15]:
# Drop the rows identified by df_grp_single_paragraph from df_grp
df_grp = df_grp.drop(df_grp_single_paragraph.index)
df_grp

Unnamed: 0,Source,Post Term,Post Tags,Title,Post URL,Authors,Post Date,Post ID,Category,Text
0,Greenpeace,Stories,Photography,Greenpeace Pictures of the Week,https://www.greenpeace.org/international/story...,Greenpeace International,2025-08-15 01:45:33,grp000000,Greenpeace,From a banner protest at the plastic treaty in...
1,Greenpeace,Stories,Forests,Environmental storytelling for a Chinese audie...,https://www.greenpeace.org/international/story...,August Rick,2025-08-14 01:40:25,grp000001,Nature,"Liu Min, better known as ‚ÄúAunt Bear‚Äù, was one ..."
2,Greenpeace,Stories,AlternativeFutures,5 reasons Greenpeace calls for new global tax ...,https://www.greenpeace.org/international/story...,Nina Stros,2025-08-13 13:47:04,grp000002,Social and Economic Systems,Smoke from the Canadian wildfires drifting thr...
3,Greenpeace,Stories,Photography,Greenpeace Pictures of the Week,https://www.greenpeace.org/international/story...,Greenpeace International,2025-08-08 05:09:19,grp000003,Greenpeace,From a demonstration at the plastic treaty in ...
4,Greenpeace,Stories,"Peace, Nuclear",From Hiroshima to Gaza: defending peace,https://www.greenpeace.org/international/story...,Greenpeace France,2025-08-07 15:22:58,grp000004,Energy,"On August 6 and 9, 1945, two atomic bombs pulv..."
...,...,...,...,...,...,...,...,...,...,...
1352,Greenpeace,Stories,"AboutUs, 50Years",Dorothy Stowe 1920 ‚Äì 2010,https://www.greenpeace.org/international/story...,Rex Weyler,2010-07-23 07:48:13,grp001352,Greenpeace,Greenpeace co-founder Dorothy Stowe passed awa...
1353,Greenpeace,Stories,"AboutUs, 50Years",Jim Bohlen 1926 ‚Äì 2010,https://www.greenpeace.org/international/story...,Greenpeace International,2010-07-06 08:06:09,grp001353,Greenpeace,There‚Äôs an old joke that you can walk into any...
1354,Greenpeace,Stories,"AboutUs, 50Years",A chat with the first Rainbow Warriors,https://www.greenpeace.org/international/story...,Michael Friedrich,2007-06-24 12:03:00,grp001354,Greenpeace,"In Vancouver, on Canada‚Äôs Pacific coast, Green..."
1355,Greenpeace,Stories,"AboutUs, 50Years",Amchitka: the founding voyage,https://www.greenpeace.org/international/story...,Greenpeace International,2007-05-14 13:56:16,grp001355,Greenpeace,"In 1971, a small group of activists set sail t..."


### Slice the `df_grp` DataFrame to extract a sample of 1,000 blog posts

To address your request, I will first filter the `df_grp` DataFrame to exclude rows with a `Post Date` on or after November 30, 2022. Then, I will select the first 1,000 rows from this filtered dataset to create the `df_grp_1000` DataFrame.

In [16]:
# First, slice df_grp by dropping the rows whose Post Date dates are after or equal to 2022-11-30
# This retains rows strictly before that date
df_grp_sliced = df_grp[df_grp['Post Date'] < '2022-12-01']

# Then, from this first slice, slice the first 1,000 rows
df_grp_1000 = df_grp_sliced.iloc[:1000]

df_grp_1000

Unnamed: 0,Source,Post Term,Post Tags,Title,Post URL,Authors,Post Date,Post ID,Category,Text
438,Greenpeace,Stories,"Climate, Forests, Oceans",3 demands we‚Äôre making for biodiversity at COP15,https://www.greenpeace.org/international/story...,Agn√®s Le Rouzic,2022-11-30 18:52:08,grp000438,Nature,We are at a crucial turning point for the futu...
439,Greenpeace,Stories,Wins,Good news stories from around the world ‚Äì Nove...,https://www.greenpeace.org/international/story...,Greenpeace International,2022-11-30 00:44:39,grp000439,Greenpeace,Citizens win the second round of an air pollut...
440,Greenpeace,Stories,EnergyRevolution,Big oil‚Äôs generational curse: pollutant-relate...,https://www.greenpeace.org/international/story...,Angelo Louw,2022-11-28 10:36:13,grp000440,Energy,Not much has changed in the 40 years since Sha...
441,Greenpeace,Stories,Consumption,Black Friday: My nightmare for the planet,https://www.greenpeace.org/international/story...,Alessandro Saccoccio,2022-11-25 10:16:31,grp000441,Greenpeace,I‚Äôve had a recurring nightmare lately. In the ...
442,Greenpeace,Stories,"Food, Oceans","Sailing 1,000 km to save Thai mackerels",https://www.greenpeace.org/international/story...,Songwut Jullanan,2022-11-25 04:51:12,grp000442,Nature,Mackerel is the soul food of the Thai nation. ...
...,...,...,...,...,...,...,...,...,...,...
1352,Greenpeace,Stories,"AboutUs, 50Years",Dorothy Stowe 1920 ‚Äì 2010,https://www.greenpeace.org/international/story...,Rex Weyler,2010-07-23 07:48:13,grp001352,Greenpeace,Greenpeace co-founder Dorothy Stowe passed awa...
1353,Greenpeace,Stories,"AboutUs, 50Years",Jim Bohlen 1926 ‚Äì 2010,https://www.greenpeace.org/international/story...,Greenpeace International,2010-07-06 08:06:09,grp001353,Greenpeace,There‚Äôs an old joke that you can walk into any...
1354,Greenpeace,Stories,"AboutUs, 50Years",A chat with the first Rainbow Warriors,https://www.greenpeace.org/international/story...,Michael Friedrich,2007-06-24 12:03:00,grp001354,Greenpeace,"In Vancouver, on Canada‚Äôs Pacific coast, Green..."
1355,Greenpeace,Stories,"AboutUs, 50Years",Amchitka: the founding voyage,https://www.greenpeace.org/international/story...,Greenpeace International,2007-05-14 13:56:16,grp001355,Greenpeace,"In 1971, a small group of activists set sail t..."


This slice results in less than 1,000 blog posts. Therefore, it is necessary to relax the requirement to include blog posts posted after 30/11/2022.

In [17]:
# Slice the last 1,000 rows
df_grp_1000 = df_grp.iloc[-1000:]

df_grp_1000

Unnamed: 0,Source,Post Term,Post Tags,Title,Post URL,Authors,Post Date,Post ID,Category,Text
354,Greenpeace,Stories,Oceans,Whales always had voices. Roger Payne helped t...,https://www.greenpeace.org/international/story...,Chris Greenberg,2023-06-13 19:47:42,grp000354,Greenpeace,Whales could always speak for themselves. Huma...
355,Greenpeace,Stories,"Climate, AlternativeFutures",The creativity of youth is changing mindsets o...,https://www.greenpeace.org/international/story...,Renata Nitta and Fabian Ogochukwu,2023-06-09 10:48:57,grp000355,Social and Economic Systems,All over the world people are coming together ...
356,Greenpeace,Stories,"Climate, Fires, Forests, ExtremeWeather, Photo...",Climate emergencies photos from the year so far,https://www.greenpeace.org/international/story...,Sudhanshu Malhotra,2023-06-05 05:26:30,grp000356,Energy,"As we celebrate World Environment Day today, i..."
357,Greenpeace,Stories,EnergyRevolution,Young professionals take on South Africa‚Äôs ene...,https://www.greenpeace.org/international/story...,Jeanette Meyer & Ellie Kouremenou,2023-06-02 10:59:58,grp000357,Energy,"In South Africa, Common Power is uplifting com..."
358,Greenpeace,Stories,"Climate, Food",10 big secrets of bees,https://www.greenpeace.org/international/story...,Greenpeace East Asia,2023-06-02 05:07:28,grp000358,Nature,This story was originally posted by Greenpeace...
...,...,...,...,...,...,...,...,...,...,...
1352,Greenpeace,Stories,"AboutUs, 50Years",Dorothy Stowe 1920 ‚Äì 2010,https://www.greenpeace.org/international/story...,Rex Weyler,2010-07-23 07:48:13,grp001352,Greenpeace,Greenpeace co-founder Dorothy Stowe passed awa...
1353,Greenpeace,Stories,"AboutUs, 50Years",Jim Bohlen 1926 ‚Äì 2010,https://www.greenpeace.org/international/story...,Greenpeace International,2010-07-06 08:06:09,grp001353,Greenpeace,There‚Äôs an old joke that you can walk into any...
1354,Greenpeace,Stories,"AboutUs, 50Years",A chat with the first Rainbow Warriors,https://www.greenpeace.org/international/story...,Michael Friedrich,2007-06-24 12:03:00,grp001354,Greenpeace,"In Vancouver, on Canada‚Äôs Pacific coast, Green..."
1355,Greenpeace,Stories,"AboutUs, 50Years",Amchitka: the founding voyage,https://www.greenpeace.org/international/story...,Greenpeace International,2007-05-14 13:56:16,grp001355,Greenpeace,"In 1971, a small group of activists set sail t..."


#### Export to a file

In [18]:
df_grp_1000.to_json(f"{output_directory}/{dataset_filename_3}.jsonl", orient='records', lines=True)

### Extract the blog posts

#### Import the data into a DataFrame

In [19]:
df_grp_1000 = pd.read_json(f"{output_directory}/{dataset_filename_3}.jsonl", lines=True)

In [20]:
df_grp_1000['Post Date'] = pd.to_datetime(df_grp_1000['Post Date'], unit='ms')

In [21]:
df_grp_1000

Unnamed: 0,Source,Post Term,Post Tags,Title,Post URL,Authors,Post Date,Post ID,Category,Text
0,Greenpeace,Stories,Oceans,Whales always had voices. Roger Payne helped t...,https://www.greenpeace.org/international/story...,Chris Greenberg,2023-06-13 19:47:42,grp000354,Greenpeace,Whales could always speak for themselves. Huma...
1,Greenpeace,Stories,"Climate, AlternativeFutures",The creativity of youth is changing mindsets o...,https://www.greenpeace.org/international/story...,Renata Nitta and Fabian Ogochukwu,2023-06-09 10:48:57,grp000355,Social and Economic Systems,All over the world people are coming together ...
2,Greenpeace,Stories,"Climate, Fires, Forests, ExtremeWeather, Photo...",Climate emergencies photos from the year so far,https://www.greenpeace.org/international/story...,Sudhanshu Malhotra,2023-06-05 05:26:30,grp000356,Energy,"As we celebrate World Environment Day today, i..."
3,Greenpeace,Stories,EnergyRevolution,Young professionals take on South Africa‚Äôs ene...,https://www.greenpeace.org/international/story...,Jeanette Meyer & Ellie Kouremenou,2023-06-02 10:59:58,grp000357,Energy,"In South Africa, Common Power is uplifting com..."
4,Greenpeace,Stories,"Climate, Food",10 big secrets of bees,https://www.greenpeace.org/international/story...,Greenpeace East Asia,2023-06-02 05:07:28,grp000358,Nature,This story was originally posted by Greenpeace...
...,...,...,...,...,...,...,...,...,...,...
995,Greenpeace,Stories,"AboutUs, 50Years",Dorothy Stowe 1920 ‚Äì 2010,https://www.greenpeace.org/international/story...,Rex Weyler,2010-07-23 07:48:13,grp001352,Greenpeace,Greenpeace co-founder Dorothy Stowe passed awa...
996,Greenpeace,Stories,"AboutUs, 50Years",Jim Bohlen 1926 ‚Äì 2010,https://www.greenpeace.org/international/story...,Greenpeace International,2010-07-06 08:06:09,grp001353,Greenpeace,There‚Äôs an old joke that you can walk into any...
997,Greenpeace,Stories,"AboutUs, 50Years",A chat with the first Rainbow Warriors,https://www.greenpeace.org/international/story...,Michael Friedrich,2007-06-24 12:03:00,grp001354,Greenpeace,"In Vancouver, on Canada‚Äôs Pacific coast, Green..."
998,Greenpeace,Stories,"AboutUs, 50Years",Amchitka: the founding voyage,https://www.greenpeace.org/international/story...,Greenpeace International,2007-05-14 13:56:16,grp001355,Greenpeace,"In 1971, a small group of activists set sail t..."


#### Extract the posts

<llm-snippet-file>cl_st2_ph2_arianne.ipynb</llm-snippet-file>


In [22]:
for index, row in df_grp_1000.iterrows():
    file_name = f"{row['Post ID']}.txt"
    file_path = os.path.join(path, file_name)
    with open(file_path, 'w', encoding='utf-8') as file:
        file.write(row['Text'])

print(f'Successfully saved {len(df_grp_1000)} text files to {path}.')

Successfully saved 1000 text files to cl_st2_ph2_arianne/grp/02_extracted.
