
# üåø Detect Greenwashing with AI

In this notebook, you‚Äôll learn how to analyze corporate sustainability statements using **Python and Natural Language Processing (NLP)** tools.  
Your goal: detect *greenwashing* ‚Äî when companies exaggerate or misrepresent their environmental commitment.

---

## üéØ Learning Objectives
- Understand what greenwashing is  
- Learn basic text analysis (NLP) in Python  
- Identify vague vs concrete ESG language  
- Visualize differences and compute a ‚Äúgreenwashing index‚Äù


## üß± 1. Load and inspect the text files

In [None]:
from google.colab import files
import io

# Step 1: upload your text files manually
uploaded = files.upload()  # select your 3 .txt files from your computer

# Step 2: read them into a dictionary
texts = {}
for filename, content in uploaded.items():
    texts[filename] = content.decode('utf-8').lower()

# Step 3: check results
for name, content in texts.items():
    print(f"--- {name} ---")
    print(content[:200], "...\n")

## üîç 2. Tokenize and count words

In [None]:
from collections import Counter

def word_frequency(text):
    words = [w for w in text.split() if len(w) > 3]
    return Counter(words).most_common(10)

for name, text in texts.items():
    print(f"Top words in {name}:")
    print(word_frequency(text))
    print()


## ‚öñÔ∏è 3. Identify vague vs concrete language

In [None]:
vague_words = ["commitment", "believe", "support", "aim", "aspire", "together", "vision", "values", "inspired"]
concrete_words = ["reduce", "recycle", "renewable", "co2", "metric", "target", "audit", "neutrality", "verified"]

def count_words(text, word_list):
    return sum(word in text for word in word_list)

for name, text in texts.items():
    v = count_words(text, vague_words)
    c = count_words(text, concrete_words)
    ratio = v / (c + 1)
    print(f"{name}: vague={v}, concrete={c}, ratio={ratio:.2f}")


## ‚òÅÔ∏è 4. Visualize language with a word cloud

In [None]:
!pip install wordcloud
from wordcloud import WordCloud
import matplotlib.pyplot as plt

for name, text in texts.items():
    plt.figure(figsize=(6,4))
    plt.imshow(WordCloud(width=800, height=400, background_color="white").generate(text))
    plt.axis("off")
    plt.title(name)
    plt.show()


## üí¨ 5. Sentiment analysis

In [None]:
!pip install textblob
from textblob import TextBlob

for name, text in texts.items():
    sentiment = TextBlob(text).sentiment
    print(f"{name}: polarity={sentiment.polarity:.2f}, subjectivity={sentiment.subjectivity:.2f}")


## üìä 6. Compare companies

In [None]:
import pandas as pd

def analyze_texts(texts):
    data = []
    for name, text in texts.items():
        v = count_words(text, vague_words)
        c = count_words(text, concrete_words)
        ratio = v / (c + 1)
        data.append({"Company": name, "Vague": v, "Concrete": c, "Ratio": ratio})
    return pd.DataFrame(data)

results = analyze_texts(texts)
results.plot(x="Company", y="Ratio", kind="bar", color="orange", title="Greenwashing Index (Vague/Concrete ratio)")



## ‚úÖ 7. Conclusion

You just performed a basic *text mining* analysis to detect potential greenwashing patterns.  
You learned how to:
- Read and clean text data  
- Count and categorize words  
- Visualize and compare language across companies  

Fintechs such as **Clarity AI**, **RepRisk**, or **ESG Book** use advanced versions of these methods to evaluate ESG credibility.


