<a href="https://colab.research.google.com/github/ruebot/notebooks/blob/main/yfile_indigenous_mallet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Cribbed from https://melaniewalsh.github.io/Intro-Cultural-Analytics/05-Text-Analysis/09-Topic-Modeling-Without-Mallet.html

In [1]:
%%capture

import os
def install_java():
  !apt-get install -y openjdk-8-jdk-headless -qq > /dev/null
  os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
  !java -version   
install_java()

!wget http://mallet.cs.umass.edu/dist/mallet-2.0.8.zip
!unzip mallet-2.0.8.zip

In [2]:
%%capture
!pip install tomotopy
!pip install little_mallet_wrapper
!pip install seaborn
!pip install git+https://github.com/maria-antoniak/little-mallet-wrapper.git

In [3]:
import tomotopy as tp
import little_mallet_wrapper
import seaborn
import glob
from pathlib import Path

os.environ['MALLET_HOME'] = '/content/mallet-2.0.8'
path_to_mallet = '/content/mallet-2.0.8/bin/mallet'

In [4]:
%%capture
!wget https://www.dropbox.com/s/xp6uh7zhem32b0m/corpus.zip
!unzip corpus.zip

In [5]:
files = glob.glob(f"corpus/*.txt")

In [13]:
import nltk
from nltk.corpus import stopwords

nltk.download("stopwords")
stopwords = nltk.corpus.stopwords.words('english')
additional_stopwords = ['yfile', 'news', 'york', 'university', 'newsletter', 'editor', 'picks', 'subscribe', 'skip', 'content', 'latest','advancement', 'alumni', 'ampd', 'brainstorm', 'community', 'convocation', 'covid19', 'csbo', 'education', 'ee', 'emaillead', 'engineering', 'faculty', 'featured', 'fes', 'fgs', 'finearts', 'glendon', 'graduate', 'health', 'innovatus', 'international', 'laps', 'lassonde', 'libraries', 'lions', 'osgoode', 'president', 'research', 'schulich', 'science', 'senate', 'sports', 'staff', 'student', 'students', 'sustainability', 'teaching', 'teaching and learning', 'top stories', 'vpacademic', 'vpri', 'archives', 'awards', 'recognition', 'features', 'innovatus', 'innovation', 'special', 'issues', 'spotlight', 'issues', 'aifeatured', 'take', 'note', 'learning', 'scoop', 'focus', 'media', 'archive', 'home', 'scoop', 'upcoming', 'events', 'contact','share', 'keele', 'markham', 'campus', 'privacy', 'legal', 'u', 'recent', 'careers', 'accessibility', 'safety']
stopwords.extend(additional_stopwords)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


In [14]:
training_data = []
original_texts = []
titles = []

for file in files:
    text = open(file, encoding='utf-8').read()
    processed_text = little_mallet_wrapper.process_string(text, lowercase=True, remove_short_words=True, remove_stop_words=True, remove_punctuation=True, numbers='remove', stop_words=stopwords)
    training_data.append(processed_text)
    original_texts.append(text)
    titles.append(Path(file).stem)

In [15]:
len(training_data), len(original_texts), len(titles)

(599, 599, 599)

In [16]:
# Number of topics to return
num_topics = 10
# Numer of topic words to print out
num_topic_words = 5

# Intialize the model
model = tp.LDAModel(k=num_topics)

# Add each document to the model, after splitting it up into words
for text in training_data:
    model.add_doc(text.strip().split())
    
print("Topic Model Training...\n\n")
# Iterate over the data 10 times
iterations = 50
for i in range(0, 100, iterations):
    model.train(iterations)
    print(f'Iteration: {i}\tLog-likelihood: {model.ll_per_word}')

print("\nTopic Model Results:\n\n")
# Print out top 10 words for each topic
topics = []
topic_individual_words = []
for topic_number in range(0, num_topics):
    topic_words = ' '.join(word for word, prob in model.get_topic_words(topic_id=topic_number, top_n=num_topic_words))
    topics.append(topic_words)
    topic_individual_words.append(topic_words.split())
    print(f"✨Topic {topic_number}✨\n\n{topic_words}\n")

Topic Model Training...


Iteration: 0	Log-likelihood: -9.064641308760516
Iteration: 50	Log-likelihood: -8.990947210621997

Topic Model Results:


✨Topic 0✨

professor story toronto full read

✨Topic 1✨

program school law professor academic

✨Topic 2✨

studies human rights conference professor

✨Topic 3✨

social canada studies black work

✨Topic 4✨

change global project climate communities

✨Topic 5✨

des les pour vice sur

✨Topic 6✨

dance film arts canadian award

✨Topic 7✨

said says people could would

✨Topic 8✨

indigenous aboriginal first people peoples

✨Topic 9✨

new history art world work



In [17]:
topic_distributions = [list(doc.get_topic_dist()) for doc in model.docs]

In [18]:
from IPython.display import Markdown, display
import re

def make_md(string):
    display(Markdown(str(string)))

def get_top_docs(docs, topic_distributions, topic_index, n=5):
    
    sorted_data = sorted([(_distribution[topic_index], _document) 
                          for _distribution, _document 
                          in zip(topic_distributions, docs)], reverse=True)
    
    topic_words = topics[topic_index]
    
    make_md(f"### ✨Topic {topic_index}✨\n\n{topic_words}\n\n---")
    
    for probability, doc in sorted_data[:n]:
        # Make topic words bolded
        for word in topic_words.split():
            if word in doc.lower():
                doc = re.sub(f"\\b{word}\\b", f"**{word}**", doc, re.IGNORECASE)
        
        make_md(f'✨  \n**Topic Probability**: {probability}  \n**Document**: {doc}\n\n')
    
    return

In [21]:
get_top_docs(titles, topic_distributions, topic_index=8, n=30)

### ✨Topic 8✨

indigenous aboriginal first people peoples

---

✨  
**Topic Probability**: 0.7120667099952698  
**Document**: 2021-09-26-euc-seminar-series-examines-treaty-relations-in-toronto



✨  
**Topic Probability**: 0.7002729177474976  
**Document**: 2021-11-16-euc-seminar-series-looks-at-**indigenous**-involvement-in-city-planning-nov-23



✨  
**Topic Probability**: 0.6513477563858032  
**Document**: 2012-06-15-learn-about-**indigenous**-**peoples**-and-sample-bannock-bread



✨  
**Topic Probability**: 0.6455286741256714  
**Document**: 2014-10-23-spirit-vigil-raises-awareness-about-violence-against-**indigenous**-women



✨  
**Topic Probability**: 0.6427420973777771  
**Document**: 2008-01-23-national-chief-of-the-assembly-of-**first**-nations-to-speak-at-york



✨  
**Topic Probability**: 0.6366148591041565  
**Document**: 2022-01-25-the-euc-seminar-series-to-discuss-the-treaty-of-niagara



✨  
**Topic Probability**: 0.632992148399353  
**Document**: 2021-10-19-euc-seminar-series-continues-with-talk-on-**indigenous**-environmental-justice



✨  
**Topic Probability**: 0.6296430826187134  
**Document**: 2022-02-09-euc-seminar-series-to-discuss-the-toronto-purchase



✨  
**Topic Probability**: 0.6226596832275391  
**Document**: 2013-10-10-sisters-in-spirit-rally-honours-missing-and-murdered-**indigenous**-women



✨  
**Topic Probability**: 0.5950532555580139  
**Document**: 2014-06-17-celebrate-national-**aboriginal**-day-at-york-university-wednesday



✨  
**Topic Probability**: 0.5609232187271118  
**Document**: 2010-06-21-today-is-national-**aboriginal**-day-a-time-to-celebrate-a-diversity-of-cultures



✨  
**Topic Probability**: 0.5457831621170044  
**Document**: 2021-09-09-two-**indigenous**-educators-join-the-faculty-of-education



✨  
**Topic Probability**: 0.5383880734443665  
**Document**: 2022-06-17-reflecting-on-national-**indigenous**-**peoples**-day-and-national-**indigenous**-**peoples**-month



✨  
**Topic Probability**: 0.5279505848884583  
**Document**: 2021-06-20-june-is-national-**indigenous**-history-month-and-june-21-is-national-**indigenous**-**peoples**-day-national-indigenous-**peoples**-day-featured-image



✨  
**Topic Probability**: 0.5257638096809387  
**Document**: 2021-06-20-june-is-national-**indigenous**-history-month-and-june-21-is-national-**indigenous**-**peoples**-day



✨  
**Topic Probability**: 0.5235186815261841  
**Document**: 2021-09-28-orange-shirts-available-for-purchase-at-york-u-bookstore



✨  
**Topic Probability**: 0.5106313228607178  
**Document**: 2013-03-14-york-unveils-its-own-tipi-a-place-to-learn-and-foster-awareness



✨  
**Topic Probability**: 0.5098571181297302  
**Document**: 2017-06-21-york-universitys-hart-house-renamed-to-create-safe-space-for-**indigenous**-**peoples**



✨  
**Topic Probability**: 0.50813227891922  
**Document**: 2009-11-23-new-program-infuses-**first**-nations-culture-and-perspectives-into-teacher-training



✨  
**Topic Probability**: 0.5052446126937866  
**Document**: 2007-10-17-newsletter-reaches-out-to-**aboriginal**-students



✨  
**Topic Probability**: 0.48970893025398254  
**Document**: 2012-04-05-annual-pow-wow-brightens-vari-hall



✨  
**Topic Probability**: 0.4811217784881592  
**Document**: 2013-02-26-three-day-celebration-of-**first**-nations-and-academia-starts-thursday



✨  
**Topic Probability**: 0.4785768687725067  
**Document**: 2022-06-24-attend-an-**indigenous**-walk-on-campus



✨  
**Topic Probability**: 0.4700418710708618  
**Document**: 2021-06-20-the-centre-for-**aboriginal**-student-services-changes-its-name-to-the-centre-for-**indigenous**-student-services



✨  
**Topic Probability**: 0.4551788568496704  
**Document**: 2011-05-20-new-project-gives-a-voice-to-**indigenous**-**people**-with-disabilities



✨  
**Topic Probability**: 0.4530550241470337  
**Document**: 2005-03-03-asay-holds-third-annual-**aboriginal**-awareness-day-and-powwow



✨  
**Topic Probability**: 0.4516863226890564  
**Document**: 2022-06-08-second-meeting-of-**indigenous**-book-club-to-consider-award-winning-braiding-sweetgrass



✨  
**Topic Probability**: 0.4459937810897827  
**Document**: 2009-11-27-mondays-research-matters-features-a-chat-with-coyote-and-raven



✨  
**Topic Probability**: 0.44141528010368347  
**Document**: 2012-02-27-**aboriginal**-students-present-symposium-with-powwow



✨  
**Topic Probability**: 0.43801650404930115  
**Document**: 2009-03-05-**aboriginal**-awareness-days-powwow-start-today



In [22]:
get_top_docs(original_texts, topic_distributions, topic_index=8, n=5)

### ✨Topic 8✨

indigenous aboriginal first people peoples

---

✨  
**Topic Probability**: 0.7120667099952698  
**Document**: EUC Seminar Series examines treaty relations in Toronto
Toronto is the traditional territory of the Wendat, Anishnaabeg and Haudenosaunee Confederacies. It is also one of the most culturally diverse cities on Earth. There is a web of historical treaties that were negotiated on these lands – agreements that hold continued relevance and possibility for the present.
Polishing the Chain: Treaty Relations in Toronto is a fall and winter conversation series that will bring together Indigenous and allied scholars, knowledge holders, artists, Earth workers and activists who will explore the historical significance and contemporary relevance of the treaties Indigenous nations in southern Ontario have made with each other, with the land and with the Crown. It will explore: the spirit and intent of Toronto treaties; the ways Indigenous Peoples have upheld and continue to uphold them; the extent to which they are (and are not) reflected in contemporary Indigenous and state relations; and the treaty responsibilities of both settler and Indigenous Torontonians.
The series’ inaugural talk, “The Symbolic Language of Wampum Diplomacy,” will take place on Tuesday, Sept. 28, just prior to Canada’s **first** National Day for Truth and Reconciliation on Sept. 30. The event, co-presented with the Toronto Biennial of Art, will feature Anishinaabe historian and York University Assistant Professor Alan Corbiere, Canada Research Chair in Indigenous History of North America; Tuscarora writer, historian and curator Rick Hill; and interdisciplinary Kanienkehaka artist Ange Loft.
The series will continue with “Taking Care of the Dish: Treaties, Indigenous Law and Environmental Justice” on Oct. 26; “Treaty Relations, Planning and Indigenous Consultation at the City of Toronto” on Nov. 23; “The Forgotten Promise of the Treaty of Niagara” on Jan. 31; “The Toronto ‘Purchase’ ” on Feb. 14; and “We are all Treaty People” on March 14.
All Fall 2021 seminars will be held from 11:30 a.m. to 1:30 p.m. via Zoom and live-streamed on the Polishing the Chain Facebook page. To register, visit bit.ly/39AO0PP.
This year’s EUC Seminar Series is co-presented by York’s new Centre for Indigenous Knowledges and Languages, the Indigenous Environmental Justice Project, and the Jumblies Theatre and Arts Talking Treaties project. For more information about the seminar series, email polishingthechain@gmail.com.




✨  
**Topic Probability**: 0.7002729177474976  
**Document**: EUC Seminar Series looks at Indigenous involvement in city planning, Nov. 23
The series’ third instalment, “Treaty Relations, Planning and Indigenous Consultation at the City of Toronto,” will take place on Tuesday, Nov. 23. Speaking at the event are Selina Young, director of the Indigenous Affairs Office for the City of Toronto; Leela Viswanathan, associate professor, School of Urban and Regional Planning, Queen’s University; and Bob Goulai, Niibisin Consulting.
Treaties, the Crown’s duty to consult and Ontario’s Provincial Planning Policy Statement have triggered new practices of Indigenous consultation and urban planning in Toronto. In this panel, speakers will discuss Indigenous planning and decision making in the Greater Toronto Area. To what extent does city planning include Indigenous nations and communities? To what extent do Indigenous Peoples have meaningful authority or decision-making power in relation to land and waters? To what extent does the city recognize and enable their ability to practise ceremony, plant and harvest food and medicines, or enact stewardship responsibilities?
Polishing the Chain: Treaty Relations in Toronto is a fall and winter conversation series that will bring together Indigenous and allied scholars, knowledge holders, artists, Earth workers, and activists who will explore the historical significance and contemporary relevance of the treaties Indigenous nations in southern Ontario have made with each other, with the land and with the Crown. It will explore: the spirit and intent of Toronto treaties; the ways Indigenous Peoples have upheld and continue to uphold them; the extent to which they are (and are not) reflected in contemporary Indigenous and state relations; and the treaty responsibilities of both settler and Indigenous Torontonians.
All Fall 2021 seminars will be held from 11:30 a.m. to 1:30 p.m. via Zoom and live-streamed on the Polishing the Chain Facebook page. To register, visit https://www.eventbrite.ca/e/treaty-relations-planning-**indigenous**-consultation-at-the-city-of-toronto-tickets-208769514237.
This year’s EUC Seminar Series is co-presented by York’s new Centre for Indigenous Knowledges and Languages, the Indigenous Environmental Justice Project, and the Jumblies Theatre and Arts Talking Treaties project. For more information about the seminar series, email polishingthechain@gmail.com.




✨  
**Topic Probability**: 0.6513477563858032  
**Document**: Learn about Indigenous **peoples** and sample bannock bread
Boozhoo, sekon, tansi, greetings. The Centre for Aboriginal Student Services (CASS) will be in Vari Hall Monday with a table providing samples of three sisters soup and bannock bread, as a way to promote awareness of Indigenous **peoples** and of the upcoming National Aboriginal Day – June 21.
CASS will hand out samples in Vari Hall, from 11:30am to 3pm. There will also be information on events happening downtown June 21.
“Three sisters soup is a traditional dish that not only tastes delicious, but it shows how advanced the Aboriginal **people** were with agriculture,” says Amber Wynne, student peer leader at CASS.  Wynne say the three sisters vegetable garden, comprised of corn, beans and squash, works together, as “the corn would take the nitrogen from the soil, which was readily replenished by the bean plant, and the squash leaves would keep the soil moist by protecting it from the sun”.
Inside the  Centre for Aboriginal Student Services 
In celebration of the 16th Annual National Aboriginal Day, CASS is bringing the event to the students, staff and faculty at York.
“It’s important to recognize and celebrate the rich culture and history behind Indigenous peoples, while promoting inclusion and unity among all nations,” says Jolene John, CASS administrative assistant. CASS would like to “share the diverse cultures of First Nations, Inuit and Metis **people** while also promoting awareness that Indigenous people are here and thriving,” says John.
The National Indian Brotherhood, now the Assembly of First Nations, **first** called for a National Aboriginal Day in 1982. It was to be a day that commemorated Aboriginal solidarity.
In 1995, The Sacred Assembly (a national conference of Aboriginal and non-Aboriginal people) once again called for a national holiday to celebrate the contributions of Aboriginal peoples. Finally, in 1996 former Governor General Romeo LeBlanc declared June 21 as National Aboriginal Day.
Some events that will be taking place in the Toronto area are as follows:
June 18 to 23
Art Show
Sponsored by the Toronto District School Board Aboriginal Education Centre
Toronto City Hall Rotunda, 100 Queen St. W.
June 21
National Aboriginal Day Sunrise Ceremony & Flag Raising
Podium Roof, Toronto City Hall, 100 Queen St. W.
5:30am
Celebrate the summer solstice with First Nations’ dancing, drumming and a sunset ceremony
Fort York National Historic Site, 250 Fort York Blvd.
6pm to sunset
All My Relations’ National Aboriginal Day
Allendale Gardens Park
12pm to 6pm
June 23
Na-Me-Rez Outdoor Traditional Powwow
Wells Hills Park, East of Bathurst & St. Clair West
Grand Entry starts at 12pm
June 29
Scarbourough Powwow
20 Waldock St., Scarborough, Ontario
Sunrise ceremony at 5:30am
Grand Entry at 12pm
June 30
Aboriginal History Month Celebration Event @ Dundas Square
Sponsored by the Native Canadian Centre of Toronto
12 to 8pm
For more information about celebrations that will be taking place around National Aboriginal Day, visit the Government of Canada’s Aboriginal Canada Portal website.
For more information on Aboriginal services, visit the CASS website or stop by the centre at 246 York Lanes.




✨  
**Topic Probability**: 0.6455286741256714  
**Document**: Spirit vigil raises awareness about violence against **indigenous** women
More than 50 students, staff and faculty recently marched from York’s Vari Hall to Osgoode Hall Law School to raise awareness about violence against **indigenous** women.
The Spirit Vigil to Honour Our Stolen Sisters, presented by the Aboriginal Students’ Association at York, was held in support of missing and murdered women in Canada, and stressed the need for a national inquiry.
Sisters in Spirit Vigils are held annually throughout Canada on Oct. 4 to raise public awareness about missing and murdered indigenous women and girls, with the goal of ending violence committed against all women. While estimates put the number of missing and murdered indigenous women in Canada at about 1,200 within the past 20 years, elder Laureen “Blu” Waters noted that in reality, this figure is much higher due to all of the violence that goes unreported.
ASAY Vice-President Karissa John shared a teaching she received by York alumna Megan Bertasson, current doctoral student at OISE and an active member in the Toronto indigenous community. The shared sentiment acknowledges: “Not only must we honour and remember our stolen sisters and their families, we also must recognize and raise awareness for the survivors of domestic violence and those women and girls that are still with us today. In doing so, we hope to bring healing to our families, our communities and our nations.”
Participants, including Professor Tania das Gupta’s students from her MIST 3680 course, carried placards, some with the names of missing and murdered indigenous women, as the crowd wound its way through Vari Hall and Central Square, and out towards Osgoode. The march ended at the York Tipi, where a ceremony, including a traditional prayer and sharing circle, was held to remember the missing and murdered women. Guest speakers included indigenous students, staff and faculty at York, and the circle was open to participants to share testimonies of friends and family members who have been deeply affected by violence.
Following the ceremony, participants gathered for a group photo in support of #IAmNotNext, a social media campaign empowering indigenous women by refusing to accept violence in their communities. Non-indigenous allies carried similar signs in support of indigenous women, declaring #IGotYourBack and #IStandByYou.
Students interested in learning more about the cause are welcome to join the Walking With Our Sisters Facebook page, follow Christi Belcourt’s commemorative art installation exhibits and participate in the annual Feb. 14 Strawberry Ceremony. For more information, visit the Centre for Aboriginal Student Services Facebook page or the Aboriginal Students’ Association at York Facebook page.
Students interested in taking York’s Indigenous Studies certificate option should contact the Department of Equity Studies at deqs@yorku.ca.




✨  
**Topic Probability**: 0.6427420973777771  
**Document**: National chief of the Assembly of First Nations to speak at York
Ask and he will come. It worked for the Osgoode Indigenous Students’ Association (OISA) at York. Phil Fontaine, national chief of the Assembly of First Nations, is coming to speak at York’s Osgoode Hall Law School simply because OISA asked him to.
Fontaine will speak on Friday, Jan. 25, at 10:30am in the Moot Court Room, 101 Osgoode Hall Law School, about some of the legal issues facing First Nations **people** today. The event is free and open to everyone.
"We are all very excited about him coming," said Hazel Herrington, OISA’s internal and external relations coordinator. "We are committed to bringing Aboriginal issues into the school."
Fontaine, an Ojibway from the Sagkeeng First Nation some 150 km north of Winnipeg, is serving his third term as national chief of the Assembly of First Nations. 
Left: Phil Fontaine
"OISA has a Visiting Elder Speaker Series which brings Aboriginal elders into Osgoode to share Aboriginal teachings," said Herrington. "Having Phil Fontaine come to Osgoode seemed like a logical extension of the work we are already involved in."
One of Fontaine’s **first** forays into advocating for First Nations rights came in his youth with the Canadian Indian Youth Council. He was then elected chief of Sagkeeng in 1972, a position he held for two consecutive terms.
With his guidance, the Sagkeeng First Nation was the **first** in Canada to have a locally-controlled education system and the first to have an Alcohol and Addictions Treatment Centre on its reserve. Sagkeeng also had one of the first Child & Family Services agencies.
Following his stint as chief of Sagkeeng, Fontaine moved with his family to the Yukon and served as the federal government’s regional director general of Indian and Northern Affairs Canada. When he returned to Manitoba in 1980, he was elected as the Manitoba regional chief for the Assembly of First Nations. Then in 1991, he became the elected grand chief of the Assembly of Manitoba Chiefs, a position he held for three consecutive terms.
Fontaine was first elected national chief of the Assembly of First Nations in 1997. In that position, he helped bring about the 1998 federal government’s Statement of Reconciliation, including a $350-million Healing Fund.
After his three-year term as chief was complete, Fontaine was appointed chief commissioner of the Indian Claims Commission (ICC) and it was in that role that he was instrumental in resolving the Kahkewistihaw First Nation’s outstanding 1907 land claim resulting in a $94.6-million settlement.
He left his position with the ICC and was then elected two more times for the position of national chief of the Assembly of First Nations, in 2003 and 2006, where he continues to work for the rights of First Nations **people**.
Fontaine holds several honorary doctorates and is a member of the Order of Manitoba.
OISA’s objectives are to promote legal education for **indigenous** people in a culturally appropriate learning environment, to raise awareness and provide support for **indigenous** initiatives within Osgoode and to support indigenous students within Osgoode. OISA also supports faculty’s advocacy or initiatives for the inclusion of indigenous perspectives, historical context, community experiences, knowledge, healing, cultural wisdom and insights within the school and curriculum. It also initiates formal and informal links between Osgoode and indigenous communities.
For more information about OISA, click here.


