# Obsidian Note Analyzer

This notebook demonstrates a simple text analysis tool that could be used to analyze notes from an Obsidian vault. It performs basic natural language processing tasks on text files.

## Setup and Imports

First, let's import the necessary libraries and download required NLTK data.

In [None]:
# Import necessary libraries
import os
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.sentiment import SentimentIntensityAnalyzer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
import matplotlib.pyplot as plt
from wordcloud import WordCloud

# Download necessary NLTK data
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('vader_lexicon')

## Utility Functions

Now, let's define some utility functions for reading files and preprocessing text.

In [None]:
def read_files(directory):
    """Read all text files from a directory."""
    texts = []
    for filename in os.listdir(directory):
        if filename.endswith('.txt'):
            with open(os.path.join(directory, filename), 'r', encoding='utf-8') as file:
                texts.append(file.read())
    return texts

def preprocess_text(text):
    """Tokenize, lowercase, and remove stopwords from text."""
    tokens = word_tokenize(text.lower())
    stop_words = set(stopwords.words('english'))
    return [word for word in tokens if word.isalnum() and word not in stop_words]

## Word Frequency Analysis

This function will analyze the frequency of words in our texts.

In [None]:
def word_frequency(texts):
    """Calculate word frequency across all texts."""
    all_words = []
    for text in texts:
        all_words.extend(preprocess_text(text))
    return nltk.FreqDist(all_words)

## Sentiment Analysis

We'll use NLTK's SentimentIntensityAnalyzer for basic sentiment analysis.

In [None]:
def analyze_sentiment(texts):
    """Perform sentiment analysis on each text."""
    sia = SentimentIntensityAnalyzer()
    return [sia.polarity_scores(text)['compound'] for text in texts]

## Topic Modeling

This function performs simple topic modeling using Latent Dirichlet Allocation.

In [None]:
def topic_modeling(texts, n_topics=5, n_top_words=10):
    """Perform topic modeling on the texts."""
    vectorizer = CountVectorizer(max_df=0.95, min_df=2, stop_words='english')
    doc_term_matrix = vectorizer.fit_transform(texts)
    lda = LatentDirichletAllocation(n_components=n_topics, random_state=42)
    lda.fit(doc_term_matrix)
    
    feature_names = vectorizer.get_feature_names_out()
    topics = []
    for topic_idx, topic in enumerate(lda.components_):
        top_words = [feature_names[i] for i in topic.argsort()[:-n_top_words - 1:-1]]
        topics.append(top_words)
    return topics

## Main Analysis Function

This function brings everything together to analyze our notes.

In [None]:
def analyze_notes(directory):
    """Main function to analyze notes."""
    texts = read_files(directory)
    
    # Word frequency
    freq_dist = word_frequency(texts)
    
    # Create and display word cloud
    wordcloud = WordCloud(width=800, height=400, background_color='white').generate_from_frequencies(freq_dist)
    plt.figure(figsize=(10, 5))
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis('off')
    plt.title('Word Cloud of Your Notes')
    plt.show()
    
    # Sentiment analysis
    sentiments = analyze_sentiment(texts)
    plt.figure(figsize=(10, 5))
    plt.hist(sentiments, bins=20)
    plt.title('Sentiment Distribution of Your Notes')
    plt.xlabel('Sentiment Score')
    plt.ylabel('Frequency')
    plt.show()
    
    # Topic modeling
    topics = topic_modeling(texts)
    for i, topic in enumerate(topics):
        print(f"Topic {i + 1}: {', '.join(topic)}")

## Run the Analysis

Finally, let's run our analysis on a directory of notes.

In [None]:
# Replace 'path/to/your/notes' with the actual path to your text files
notes_directory = 'path/to/your/notes'
analyze_notes(notes_directory)

## Next Steps

This notebook provides a foundation for analyzing text data from your notes. Here are some ideas for expanding on this project:

1. Modify the code to read Markdown files directly from your Obsidian vault.
2. Implement more advanced NLP techniques like named entity recognition or text summarization.
3. Create a function to find similar notes based on content.
4. Develop a simple search functionality using the processed text data.

Remember to experiment and adapt the code to better suit your specific needs and interests!