<a href="https://colab.research.google.com/github/idantobis/Deep-Learning-Winter-2024/blob/main/projects.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## 1. Intelligent Document Scanning and Historical Analysis System for WWII Archives
   - **Description**: Develop a document scanning and analysis application specifically for the Museum of Jewish Soldiers in WWII. The project will focus on digitizing, recognizing, and categorizing documents and photos from the museum's archive. The system should use OCR to extract text, apply NLP to identify key entities (names, dates, locations), and categorize artifacts. This could include features to recognize specific military units, historical figures, or notable events. Extra functionality could involve cross-referencing names and events for a deeper historical analysis.
   - **Features**:
     - **Document Digitization**: Scanning and storing images of historical documents.
     - **OCR and Entity Recognition**: Extract text and recognize names, dates, military units, and keywords.
     - **Categorization and Tagging**: Automatically categorize documents by topic, year, or military division.
     - **Historical Insights**: Offer search and filtering tools, potentially with visualization, for exploring connections (e.g., people who served together, battles or operations in common).
   - **Technologies**: Python (Tesseract OCR for text extraction), NLP tools (spaCy, Named Entity Recognition models), Django/Flask for the back end, front-end visualization with JavaScript (React), and a database for storing, organizing, and retrieving scanworld datasets.

## 2. Interactive Historical Timeline and Map
   - **Description**: Develop an interactive timeline and map that visualizes events, troop movements, and notable battles involving Jewish soldiers during WWII. Users can explore events chronologically or by location, with rich details pulled from the museum's archive. This tool would allow visitors to trace individual journeys or view large-scale operations, combining educational insights with data visualization.
   - **Features**:
     - **Timeline View**: Display events, missions, or battles over time, allowing users to filter by date range.
     - **Map Overlay**: Show locations of notable battles, operations, and journeys across various WWII fronts.
     - **Detailed Pop-Ups**: Enable users to click on timeline events or map markers to view associated documents, photos, and personnel.
   - **Technologies**: JavaScript (D3.js, Leaflet for maps), Python (data extraction from documents), Flask/Django for back end, and a database for storing location and time metadata

## 3. Image Recognition and Tagging for Military Artifacts
   - **Description**: Develop a system that uses computer vision to identify and categorize military artifacts from WWII (e.g., uniforms, medals, insignia, weaponry) based on images in the archive. The system could automatically suggest tags or categories, helping the museum organize its collection more efficiently.
   - **Features**:
     - **Artifact Detection and Classification**: Recognize and label objects in photos, such as badges, medals, or vehicles, and classify them by type or origin.
     - **Metadata Association**: Link recognized artifacts with metadata (e.g., approximate year, region, military unit).
     - **Search by Artifact**: Allow users to search the database by specific artifact types or classifications.
   - **Technologies**: Python (TensorFlow/PyTorch for image recognition), OpenCV for preprocessing, Flask/Django for back-end support, and a front-end interface for browsing and searching

## 4. Historical Document Clustering and Topic Discovery
   - **Description**: Create a system to automatically group and discover topics in the museum’s document archive. This project would use unsupervised machine learning to cluster similar documents, helping the museum identify themes, trends, and new insights within the collection.
   - **Features**:
     - **Text Preprocessing and Vectorization**: Clean and process text data for analysis.
     - **Topic Modeling**: Apply methods like Latent Dirichlet Allocation (LDA) to discover hidden topics.
     - **Visualization**: Present clusters in an interactive format, where users can click on topics to view associated documents.
   - **Technologies**: Python (scikit-learn, Gensim for topic modeling), NLP preprocessing tools, Flask/Django, and D3.js or Plotlf5r visualizations.

## 5. Interactive Family Tree Builder for Soldier Connections
   - **Description**: Develop a tool that allows users to build interactive family trees or connection graphs of Jewish soldiers based on the archive data. This tool could help users uncover familial or unit-based connections between individuals, giving insights into family contributions to the war effort.
   - **Features**:
     - **Family and Military Connections**: Recognize familial relations (siblings, cousins) or military unit memberships and visualize them as nodes and edges.
     - **Search and Filter**: Allow users to search by name, unit, or date and view related family or military connections.
     - **Interactive Visualization**: Display connected individuals on an expandable tree, with links to documents or photos.
   - **Technologies**: Python (NLP for entity recognition), JavaScript (D3.js or Cytoscape.js for interactive graphs), Flask/Django for back-end, and a databe6to store relational data

## 6. Speech-to-Text Conversion for Oral Histories and Interviews
   - **Description**: Develop an application that converts audio interviews and oral histories into searchable text transcripts, making the museum’s audio resources accessible to more users and searchable by keyword.
   - **Features**:
     - **Speech Recognition**: Convert audio files into text, applying NLP to clean and structure the transcripts.
     - **Entity Recognition**: Identify key entities like names, locations, dates, and military terms within the transcripts.
     - **Searchable Interface**: Create a search tool for users to look up keywords, names, or specific battles within the transcripts.
   - **Technologies**: Python (SpeechRecognition or DeepSpeech for transcription), NLP (spaCy or NLTK for entity recognition), Flask/Django for web interface, and stog7 for audio files and transcripts

## 7. Augmented Reality (AR) Application for Museum Exhibits
   - **Description**: Create an AR app for use on tablets or mobile devices that museum visitors can use to scan physical exhibits and view additional digital content, like background information, related documents, or interactive 3D models.
   - **Features**:
     - **AR Content Overlays**: Recognize exhibits and overlay relevant digital content, including photos, documents, or videos.
     - **Historical Context**: Provide contextual information about each exhibit, such as the historical significance of specific uniforms or weapons.
     - **Interactive 3D Models**: Allow users to interact with digital replicas of artifacts or documents in 3D.
   - **Technologies**: ARKit/ARCore for augmented reality, Unity for app development, Python for back-end data handli,8and cloud storage for multimedia content

## 8. Interactive Testimonies Analysis Tool Using NLP
   - **Description**: Build a platform that uses NLP to analyze and summarize soldiers' testimonies, grouping related stories by themes such as camaraderie, resilience, or specific battles. Users could explore these testimonies through thematic connections.
   - **Features**:
     - **Testimony Extraction and Summarization**: Summarize longer texts to make them more accessible, while identifying core themes.
     - **Sentiment Analysis**: Analyze the sentiment to classify sections as positive, negative, or neutral.
     - **Theme Exploration**: Allow users to search by theme, emotion, or specific battles/events, viewing interconnected testimonies.
   - **Technologies**: Python (NLP for summarization and sentiment analysis, spaCy or Hugging Face Transformers), Flask/Django for the we and data visualization in a meaningful, historical context.