<a href="https://colab.research.google.com/github/jbloewencolon/Analyzing-The-Doctrine-of-Discovery/blob/main/Analyzing_the_DoD.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This project aims to conduct a language analysis of the documents comprising the "Doctrine of Christian Discovery." Despite the dataset's limited size, this analysis will delve into sentiment analysis, topic modeling, and comparative text analysis to gain insights into historical perspectives on indigenous peoples, gold, and religious matters.

Step 1: Setting Up the Python Environment
First, we need to set up our Python environment with the necessary libraries.

In [None]:
# Importing required libraries
import pandas as pd
import nltk
from nltk.corpus import stopwords
from gensim import corpora, models
from textblob import TextBlob
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer


Step 2: Data Collection and Preprocessing
After sourcing the documents, we'll convert them into a text-readable format and clean the data.

In [None]:
# Example: Reading a text file
with open('document.txt', 'r') as file:
    document = file.read()

# Basic preprocessing
document = document.lower()
document = nltk.word_tokenize(document)

# Removing stopwords
stop_words = set(stopwords.words('english'))
document = [word for word in document if word not in stop_words]


Step 3: Exploratory Data Analysis (EDA)
We'll start with some basic EDA to understand our dataset better.

In [None]:
# Word Frequency Distribution
freq_dist = nltk.FreqDist(document)
freq_dist.plot(30, cumulative=False)


Step 4: Sentiment Analysis
Next, we'll analyze the sentiment of the text.

In [None]:
# Using TextBlob for sentiment analysis
blob = TextBlob(" ".join(document))
print(blob.sentiment)

# Using Vader Sentiment Analyzer
analyzer = SentimentIntensityAnalyzer()
print(analyzer.polarity_scores(" ".join(document)))


Step 5: Topic Modeling
We'll use LDA to identify prominent topics in the text.

In [None]:
# Preparing data for LDA
dictionary = corpora.Dictionary([document])
corpus = [dictionary.doc2bow(text) for text in [document]]

# Applying LDA
lda_model = models.ldamodel.LdaModel(corpus, num_topics=3, id2word=dictionary, passes=15)
print(lda_model.print_topics())


Step 6: Comparative Text Analysis
We'll compare the frequency and context of key words.

In [None]:
# Example of comparative analysis
vectorizer = CountVectorizer()
X = vectorizer.fit_transform([" ".join(document)])
word_freq = pd.DataFrame(X.toarray(), columns=vectorizer.get_feature_names())
print(word_freq)


Step 7: Visualizations and Reporting
We'll create visualizations to effectively communicate our findings.

In [None]:
# Example: Creating a Word Cloud
from wordcloud import WordCloud
import matplotlib.pyplot as plt

wordcloud = WordCloud().generate(" ".join(document))

plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()


Step 9: Further Research and Analysis
Suggestions for further research and analysis go here.

Step 10: Ethical Considerations
Reflect on the ethical aspects of analyzing such historical texts.

Step 11: Limitations and Future Work
Acknowledge the limitations due to the small size of the dataset and propose future research directions.