# Exercises

In [10]:
import re

You have been tasked with a data science project where the goal is to analyse social media posts related to forest conservation. The project involves extracting specific information from the compilation of social media posts below.

In [11]:
social_media_posts = """
Great news! The GreenWood Project has successfully planted 10000 trees in the Amazon Rainforest #GreenEarth #Conservation
Update: ForestCoverApp shows a 12% increase in forest cover in the last 5 years. #TechForGood
Sad to see illegal logging in Madagascan rainforests. We need stricter laws! #SaveForests #ActNow
Celebrating World Environment Day with a pledge to plant 20000 trees. Join us! #EnvironmentDay #GoGreen
Interesting study published in NatureJournal: Rainforest biodiversity is crucial for ecological balance. #ScienceForNature
"""

### Exercise 1
**Scenario**: Your first task is to analyse the hashtags used in these posts, as they can give insights into popular environmental campaigns.

**Exercise**: Write a Python function called extract_hashtags to extract all unique hashtags from the social media posts.
```python
# insert code here

# Test with the provided text
print(extract_hashtags(social_media_posts))
```

In [13]:
# insert code here
def extract_hashtags(text):
    pattern = r'#\w+'
    hashtags = re.findall(pattern, text)
    return set(hashtags)

# Test with the provided text
print(extract_hashtags(social_media_posts))

{'#Conservation', '#ActNow', '#ScienceForNature', '#EnvironmentDay', '#SaveForests', '#TechForGood', '#GoGreen', '#GreenEarth'}


### Exercise 2
**Scenario**: You want to quantify the impact of these conservation efforts. For this, extracting numerical data from the posts will be helpful.

**Exercise**: Write a Python function called extract_numbers to find all numbers mentioned in the posts.
```python
# insert code here

# Test with the provided text
print(extract_numbers(social_media_posts))
```

In [16]:
# insert code here
def extract_numbers(text):
    pattern = r'\b\d+\b'
    numbers = re.findall(pattern, text)
    return numbers

# Test with the provided text
print(extract_numbers(social_media_posts))

['10000', '12', '5', '20000']


### Exercise 3
**Scenario**: To understand public sentiment, you need to count how often words related to negative impacts (like "illegal", "logging") appear.

**Exercise**: Write a function called count_specific_words to count the occurrences of the words "illegal" and "logging" in the posts.
```python
# insert code here

# Test with the provided text
print(count_specific_words(social_media_posts))
```

In [20]:
# insert code here
def count_specific_words(text):
    pattern = r'\billegal\b|\blogging\b'
    return len(re.findall(pattern, text))

# Test with the provided text
print(count_specific_words(social_media_posts))

2


### Exercise 4
**Scenario**: For geographical analysis, you need to extract mentioned locations such as "Amazon Rainforest" and "Madagascan rainforests".

**Exercise**: Write a function to extract proper names that refer to locations.
```python
# insert code here

# Test with the provided text
print(extract_locations(social_media_posts))
```

In [21]:
# insert code here
def extract_locations(text):
    pattern = r'\b[A-Z][a-z]+(?:\s[A-Z][a-z]+)*(?:\s[rR]ainforests?)?\b'
    potential_locations = re.findall(pattern, text)
    locations = [loc for loc in potential_locations if not loc.endswith(':') and 'rainforest' in loc.lower() and loc != 'Rainforest']
    return locations
    
# Test with the provided text
print(extract_locations(social_media_posts))

['Amazon Rainforest', 'Madagascan rainforests']
