Science Sync

This system is designed to enhance the data management, access and analysis of Google Scholar Alerts sent to emails by integrating several automated processes. The core functionalities are described below.

Overview

Email Integration:

Gmail and Outlook Connectivity: The system connects to your Gmail or Outlook accounts to retrieve Google Scholar alerts.

Alert Retrieval and Parsing

Google Scholar alerts are automatically fetched and parsed using Beautiful Soup and regular expressions: regex to extract relevant information such as titles, authors, publication dates, and links.

Data Storage:

The parsed information is stored in an in-memory database using Sqlite3 for easy access and further processing.

Machine Learning-Based Analysis:

Clustering:

The system employs clustering algorithms (such as KMeans, KMedoids, and Agglomerative Clustering) to group similar articles.

Similarity Metrics:

Jaccard similarity / Euclidean can be used to measure the similarity between different articles based on their references.

Interfaces:

Interfaces built using customtkinter facilitate communication with the system.

Results Display: The system provides intuitive visualization tools to display clustering results and other analytical insights.

Pre-Requisites

Python 3.10 or higher versions

Official website for download: https://www.python.org/doc/
pip (for instaling all related dependencies)

pip installation guide: https://pip.pypa.io/en/stable/installation/
your preferred IDE: Visual Studio Code or others.

VS Code download page: https://code.visualstudio.com/Download

(recommended) Python extension in VS Code Market place: https://code.visualstudio.com/docs/editor/extension-marketplace

How To Run

a. Clone the repository: (git is required)
```
git clone https://github.com/rohra-mehak/ScienceSync.git
```
```
cd ScienceSync
```
b. Alternatively Download the code:

Navigate to: https://github.com/rohra-mehak/ScienceSync

Click the Code button.

Select Download ZIP.

Extract the ZIP file to your desired location.
Navigate to the root folder directory:
```
cd yourpath/to/ScienceSync
```
On Linux , macOS or Windows Use the mkdir command followed by the name of the directory in the terminal of your IDE.
```
mkdir secrets
```
```
mkdir database
```
Configure a virtual environment

In your IDE, make sure you are in the ScienceSync directory. go to the terminal window and run the following commands

Example for VS Code:

Create a virtual env directory called venv in the root ScienceSync directory
```
1. python -m venv venv
```
This Execution Policy command is used in the context of a Windows PowerShell and is not applicable for other OS.
```
Set-ExecutionPolicy Unrestricted -Scope Process
```
Activate the Environment

Windows
```
2. .\venv\Scripts\activate
```
MacOS / Linux
```
2. source venv/bin/activate
```
Once it is activated, you may see the (venv) prefix to your command line path.
Install all dependencies

run the following command and wait for all dependencies to finish installing.
```
pip install -r requirements.txt
```
Configure the IDE to use the Virtual Environment

To ensure your IDE uses the correct Python interpreter from your virtual environment, you generally need to configure the IDE to recognize and use the virtual environment. Here’s a generalized approach for VS code

Visual Studio Code (VS Code)
1. Open Command Palette:
  - Press Cmd+Shift+P (macOS) or Ctrl+Shift+P (Windows/Linux) to open the command palette.
2. Select Interpreter:
  - Type Python: Select Interpreter -> Enter Interpreter Path -> Find Interpreter.
3. Choose Virtual Environment:
  - Select the interpreter located in your virtual environment (venv) directory. It will typically look like ./venv/bin/python or .\venv\Scripts\python.exe on Windows.
Configuring Credentials (GoogleAPI or GraphAPI)

To access your email account, you'll need to obtain your own client ID and client secret tokens. Depending on your email service (Outlook or Gmail), follow the appropriate steps below:

a. Accessing Outlook (using MS Graph)

Register Your Application: Follow the process outlined in the Microsoft documentation to register your application and obtain the necessary tokens: Register an app). with Mail.Read , Mail.ReadWrite, User.Read API Permissions.
Save Credentials: Once you have your application ID and client secret, save them in a file named credentials_msgraph.json in the ScienceSync/secrets directory. The file should have the following format:
```
{
  "application_id": "your_app_id",
  "client_secret": "your_client_secret"
}
```

b. Accessing Gmail (using Google API)

Set Up Your Environment: Follow the steps mentioned in the Google documentation (Set up your Environment Section only) to register your application and obtain the necessary tokens: Set up your environment.
Download and Save Credentials: After registering, download the JSON file containing your credentials. Save this file as credentials.json in the ScienceSync/secrets directory.

Additional resources and information on working with Google APIs can be found here: Getting started with Google APIs.

By following the above instructions, you will successfully configure your credentials for accessing your email account using either MS Graph or Google API.

Navigate to ScienceSync/app.py

there are various parameters that can be set before running the program. However it is recommended to leave the default values as they are.

days_ago (no of days to look back while going through the mailbox)
table_name (the name of the table in your article database which will be created and referred by the system)
n_clusters (number of groups [for clustering articles together] to divide the articles into)
method (the clustering methodology -> KMedoids / KMedoids++ / Agglomerative (average linkage) / Agglomerative (complete linkage))
metric (the similarity metric to use -> dice / jaccard / sokal and sneath)

Running the main file

After making sure all steps are successfully completed and all dependencies have been installed, Make sure you are in the root Science sync directory. To start the program, run the following command on your terminal

   python app.py

Sample Snapshots

Arrows are simply illustrative indicators.

Initial Screen

Depending on the service chosen and whether credentials could be located by the program, this part might be different.

Redirection to Authorisation

The authorisation continues on your browser and this will depend on the service you chose. The initial screen keeps updating the user about progress of the system and errors encountered if any.

Logs can be used to identify any problem encountered. They provide the exact line, method and file where some exception or error occured.

Wait for the process to finish executing and for the results interface to load.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Workflow Diagrams		Workflow Diagrams
logs		logs
modules		modules
static/media		static/media
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Science Sync

Overview

Email Integration:

Alert Retrieval and Parsing

Data Storage:

Machine Learning-Based Analysis:

Clustering:

Similarity Metrics:

Interfaces:

Pre-Requisites

How To Run

a. Clone the repository: (git is required)

b. Alternatively Download the code:

Navigate to the root folder directory:

Configure a virtual environment

Install all dependencies

Configure the IDE to use the Virtual Environment

Visual Studio Code (VS Code)

Configuring Credentials (GoogleAPI or GraphAPI)

a. Accessing Outlook (using MS Graph)

b. Accessing Gmail (using Google API)

Running the main file

Sample Snapshots

Initial Screen

Redirection to Authorisation

After finishing up the process -> Click on All Data to view an itemised list of all extracted articles

Viewing the scrollable itemised list of articles. Click on a single article to view more information

Article Information on the Right Hand Tab. This includes additional functionalities to Navigate to Article, Save on Google Scholar.

Additional Export Options below and also Display settings for UI scaling and Themes

Similarly One can go on to see the article groups and view related articles.

About

Releases

Packages

Languages

rohra-mehak/ScienceSync

Folders and files

Latest commit

History

Repository files navigation

Science Sync

Overview

Email Integration:

Alert Retrieval and Parsing

Data Storage:

Machine Learning-Based Analysis:

Clustering:

Similarity Metrics:

Interfaces:

Pre-Requisites

How To Run

a. Clone the repository: (git is required)

b. Alternatively Download the code:

Navigate to the root folder directory:

Configure a virtual environment

Install all dependencies

Configure the IDE to use the Virtual Environment

Visual Studio Code (VS Code)

Configuring Credentials (GoogleAPI or GraphAPI)

a. Accessing Outlook (using MS Graph)

b. Accessing Gmail (using Google API)

Running the main file

Sample Snapshots

Initial Screen

Redirection to Authorisation

After finishing up the process -> Click on All Data to view an itemised list of all extracted articles

Viewing the scrollable itemised list of articles. Click on a single article to view more information

Article Information on the Right Hand Tab. This includes additional functionalities to Navigate to Article, Save on Google Scholar.

Additional Export Options below and also Display settings for UI scaling and Themes

Similarly One can go on to see the article groups and view related articles.

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages