
# 📝 6.2 Text Analysis for Qualitative Research

This notebook introduces text analysis techniques for qualitative nutrition research, focusing on processing survey responses.

**Objectives**:

- Preprocess text data using tokenisation and stopword removal.
- Perform word frequency analysis and visualisation.
- Apply techniques to `food_preferences.txt` to uncover hippo dietary preferences.

**Context**: Qualitative analysis of survey data, like hippo food preferences, reveals insights into dietary behaviours, complementing quantitative methods. 🦛

<details><summary>Fun Fact</summary>
Hippos love to express their food preferences, and text analysis helps us decode their crunchy cravings! 🦛
</details>

In [None]:
# Setup for Google Colab: Fetch datasets automatically or manually
%run ../../bootstrap.py    # installs requirements + editable package

import fns_toolkit as fns

import pandas as pd
import nltk  # For natural language processing
import pandas as pd  # For data manipulation
from nltk.tokenize import word_tokenize  # For splitting text into words
from nltk.corpus import stopwords  # For removing common words
from collections import Counter  # For counting word frequencies
import matplotlib.pyplot as plt  # For visualization

print('Python environment ready.')
print('Python environment ready.')

# 📝 6.2 Text Analysis for Qualitative Research

This notebook introduces text analysis techniques for qualitative nutrition research, focusing on processing survey responses.

**Objectives**:
- Preprocess text data using tokenization and stopword removal.
- Perform word frequency analysis and visualization.
- Apply techniques to `food_preferences.txt` to uncover hippo dietary preferences.

**Context**: Qualitative analysis of survey data, like hippo food preferences, reveals insights into dietary behaviours, complementing quantitative methods.

<details><summary>Fun Fact</summary>
Hippos love to express their food preferences, and text analysis helps us decode their crunchy cravings! 🦛
</details>

In [None]:



# Download NLTK resources
nltk.download('punkt_tab')  # Tokenizer
nltk.download('stopwords')  # Stopwords list
print('Text analysis environment ready.')

Text analysis environment ready.


## Data Preparation

Load `food_preferences.txt`, containing 50 hippo survey responses, and preprocess the text.

In [None]:
# Load survey responses
file_path = fns.get_data_path("food_preferences")


with open(file_path, 'r') as f:
    responses = f.readlines()
print(responses[:2])

print(f'Number of responses: {len(responses)}')  # Display total responses
print(f'Sample response: {responses[0]}')  # Show first response

Number of responses: 50
Sample response: Hippo H1: I enjoy crunchy carrots.


## Text Preprocessing

Tokenize responses, convert to lowercase, and remove stopwords and punctuation.

In [3]:
# Initialize stopwords
stop_words = set(stopwords.words('english')).union({':', '.', 'hippo', 'i'})

# Tokenize and clean responses
tokens = []
for response in responses:
    words = word_tokenize(response.lower())  # Convert to lowercase and tokenize
    clean_words = [word for word in words if word.isalpha() and word not in stop_words]
    tokens.extend(clean_words)

print(f'Sample tokens from first response: {tokens[:3]}')  # Show first few tokens

Sample tokens from first response: ['enjoy', 'crunchy', 'carrots']


## Word Frequency Analysis

Count and visualize the most common words in the responses.

In [4]:
# Count word frequencies
word_freq = Counter(tokens)
top_words = word_freq.most_common(5)
print(f'Top 5 words: {top_words}')

# Visualize word frequencies
words, counts = zip(*top_words)
plt.figure(figsize=(8, 6))
plt.bar(words, counts)
plt.title('Top 5 Words in Hippo Food Preferences')
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.show()  # Display bar plot

Top 5 words: [('carrots', 15), ('crunchy', 12), ('sweet', 10), ('enjoy', 8), ('greens', 7)]


<Figure size 800x600 with 1 Axes>

## Exercise: Analyze Adjectives

Modify the preprocessing to extract only adjectives (e.g., 'crunchy', 'sweet') and count their frequencies. Visualize the top 3 adjectives in a bar plot.

**Guidance**:
- Use NLTK’s part-of-speech tagging (`nltk.pos_tag`) with `nltk.download('averaged_perceptron_tagger')`.
- Filter for adjectives (POS tag 'JJ').
- Create a bar plot of the top 3 adjectives.

**Answer**:

My adjective analysis code and results are as follows:

```python
# Your code here
```

**Top 3 Adjectives**:

- [Adjective 1]: [Count]
- [Adjective 2]: [Count]
- [Adjective 3]: [Count]

## Conclusion

You’ve applied text analysis to uncover dietary preferences from hippo survey responses, revealing key themes like 'crunchy carrots'.

**Next Steps**: Apply these skills to your own qualitative datasets or revisit earlier modules for quantitative analysis.

**Resources**:
- [NLTK Documentation](https://www.nltk.org/)
- [Text Analysis Tutorial](https://www.datacamp.com/community/tutorials/text-analytics-beginners-nltk)
- Repository: [github.com/ggkuhnle/data-analysis-toolkit-FNS](https://github.com/ggkuhnle/data-analysis-toolkit-FNS)