In [None]:
import sys
sys.path.append("/Users/okanyenigun/Desktop/codes/reals/pypi_pkgs/v1/")

In [1]:
import modulift as mf

### `search_by_keywords`

Search for PyPI packages by one or more keywords, with control over how terms are combined and ranked.

#### Signature
```python
def search_by_keywords(
    *args,
    relation: Literal["and", "or"] = "or",
    limit: int = 5,
    method: Literal["exact", "jaccard"] = "exact",
    markdown: bool = False,
) -> List[Dict]:
```

#### What it does
- Supports two matching methods:
  - **exact**: simple substring inclusion
  - **jaccard**: set-based Jaccard similarity
- Allows combining keywords with **OR** (any match) or **AND** (all match).
- Optionally prints results as a Markdown table.


In [2]:
result = mf.search_by_keywords("synthetic data")

In [3]:
print(result)

[{'package': 'autodatagen', 'description': 'automates the generation of synthetic data for machine learning and data science projects.', 'keywords': 'synthetic data, data generation, machine learning, data augmentation, datasets, ai, ml', 'popularity': 5}, {'package': 'gensysco', 'description': 'the gensysco library provides tools for generating and manipulating synthetic data, particularly useful for machine learning tasks like model training and evaluation.', 'keywords': 'synthetic data, machine learning, data generation, dataset creation', 'popularity': 5}, {'package': 'table-evaluator', 'description': 'tableevaluator is a python library designed to assess the similarity between synthetic datasets and real data, providing insights into the authenticity of generated data. it is particularly useful in fields like finance, healthcare, and government, where high-quality synthetic data can be created without privacy concerns. the library offers a consistent evaluation method for models g

Search for packages containing all of the keywords “data science,” “machine learning,” and “deep learning,” and displays the top results as a Markdown-formatted table.

In [4]:
result = mf.search_by_keywords("data science", "machine learning", "deep learning", relation="and", markdown=True)

**Package:** luma-ml

**Description:** luma is a python library designed for machine learning and data science. it provides tools for data analysis, model building, evaluation, and deployment.

**Keywords:** machine learning, data science, classification, clustering, model building, evaluation, deep learning, pip install luma-ml

**Popularity:** 5


---
**Package:** neuralhydrology

**Description:** a python library for training neural networks with a focus on hydrological applications.

**Keywords:** machine learning, deep learning, hydrology, water resources, data science

**Popularity:** 5


---
**Package:** oifits

**Description:** a python package for handling oifits data, simplifying access and processing of vlbi data for machine learning and deep learning applications.

**Keywords:** oifits, vlbi, data science, machine learning, deep learning, astrophysics

**Popularity:** 5


---
**Package:** omniverse

**Description:** omniverse is a comprehensive library designed for machine learning, deep learning, data science, and software engineering explorations. it provides tools and resources for building and running docker images, managing ci/cd workflows with github actions, and creating jupyter book documentation, making it a versatile choice for developers and data scientists looking to streamline their workflows and enhance their projects.

**Keywords:** machine learning, deep learning, data science, software engineering, docker, ci/cd, jupyter book

**Popularity:** 5


---
**Package:** paddlex

**Description:** the paddlex toolkit provides a comprehensive end-to-end development environment for paddlepaddle, a powerful open-source machine learning framework.

**Keywords:** paddlepaddle, machine learning, deep learning, data science, ai

**Popularity:** 5


---


 **Exact match (`method="exact"`)**

In [5]:
results_exact = mf.search_by_keywords(
      "chart", "plot", method="exact", markdown=True, limit=1     
)

**Package:** pglive

**Description:** pglive provides thread-safe live plotting capabilities based on pyqtgraph for python applications.

**Keywords:** pyqtgraph, live plotting, data visualization, thread safety, qthread

**Popularity:** 5


---


**Using Jaccard similarity**

In [6]:
results_exact = mf.search_by_keywords(
      "chart", "plot", method="jaccard", markdown=True, limit=2
)

**Package:** cc-lvs

**Description:** a python library for creating and managing a variety of data visualization tools.

**Keywords:** data visualization, visualization, chart, plot

**Popularity:** 4

**Jaccard_score:** 0.5


---
**Package:** VizPack

**Description:** vizpack is a python visualization package designed for creating insightful plots and charts from pandas dataframes. it provides tools to label, annotate, and customize visualizations for effective data exploration.

**Keywords:** pandas, data visualization, visualization, plot, chart

**Popularity:** 1

**Jaccard_score:** 0.4


---


### `search_by_package_name`

Search for a single PyPI package by its exact name, with optional Markdown formatting.

#### Signature
```python
def search_by_package_name(
    package_name: str,
    markdown: bool = False,
) -> Dict[str, str]:
```

#### What it does
- Performs a case-sensitive match first; if no match is found, falls back to a case-insensitive search.
- Optionally prints results as a Markdown table.


In [7]:
result = mf.search_by_package_name("pandas", markdown=True)

**Package:** pandas

**Description:** a powerful python data analysis toolkit for handling relational and labeled data with ease.

**Keywords:** data analysis, data manipulation, dataframe, numerical analysis, machine learning, data science

**Popularity:** 5


---


### `search_by_description`

Search for PyPI packages by matching their description text, using either full-text search or embedding-based cosine similarity, with control over result count and optional Markdown output.

#### Signature
```python
def search_by_description(
    description: str,
    limit: int = 5,
    method: Literal["fts", "cosine"] = "fts",
    markdown: bool = False,
    **kwargs
) -> List[Dict]:
```

#### What it does
- Supports two search methods:
  - **fts** (full-text search): substring match (case-insensitive) on the `description` field.
  - **cosine** (semantic similarity): computes cosine similarity between the query embedding and stored package embeddings.
- Optionally prints results as a Markdown table.

In [8]:
result = mf.search_by_description("training and deploying machine learning models on Amazon", markdown=True)

**Package:** sagemaker

**Description:** the sagemaker python sdk is an open-source library designed for training and deploying machine learning models on amazon sagemaker. it supports popular deep learning frameworks like apache mxnet and tensorflow, as well as amazon's own scalable algorithms optimized for sagemaker. users can also deploy custom algorithms in docker containers, making it versatile for various machine learning tasks.

**Keywords:** machine learning, deep learning, model deployment, amazon sagemaker, data science, ai, docker

**Popularity:** 5


---


Full-text search (default method), top 5 results

In [9]:
fts_results = mf.search_by_description(
    "data analysis",
    limit=1,
    method="fts",
    markdown=True
)

**Package:** 310-notebook

**Description:** a jupyterlab extension that provides a notebook interface for interactive data analysis and visualization.

**Keywords:** jupyterlab, notebook, data analysis, visualization

**Popularity:** 5


---


Cosine similarity search, using default embedding model, top 3 results

In [10]:
cosine_results = mf.search_by_description(
    "neural network",
    limit=3,
    method="cosine",
    markdown=True
)

Using default embedding model: sentence-transformers/all-MiniLM-L6-v2.If you want to use a different model, please specify it under the 'embedding_model' parameter.


**Package:** blair-nn

**Description:** this library implements a feedforward neural network (ffnn) model with a single hidden layer, utilizing vanilla backpropagation and stochastic gradient descent for training.

**Keywords:** neural network, feedforward, backpropagation, gradient descent, machine learning

**Popularity:** 3

**Cosine_similarity:** 0.64157634973526


---
**Package:** perceptron-pypi-dhires9196

**Description:** this library provides a framework for building and training perceptrons, a fundamental type of neural network. it offers tools for defining the architecture, initializing weights, and performing backpropagation to optimize model parameters.

**Keywords:** perceptron, neural network, machine learning, deep learning

**Popularity:** 2

**Cosine_similarity:** 0.6225101947784424


---
**Package:** ANN-Implementation-AmanGupta0112

**Description:** this library provides implementations of various artificial neural network (ann) models, including feedforward networks, convolutional networks, and recurrent networks. it offers functionalities for training, testing, and evaluating ann models.

**Keywords:** artificial neural network, ann, machine learning, deep learning

**Popularity:** 1

**Cosine_similarity:** 0.621996283531189


---


### `find_similar_packages`

Search for similar PyPI packages based on a reference package name, with multiple similarity methods and optional Markdown output.

#### Signature
```python
 def find_similar_packages(
     reference_package: str,
     limit: int = 5,
     method: Literal["tf-idf", "cosine", "jaccard"] = "tf-idf",
     markdown: bool = False,
 ) -> List[Dict]:
```

#### What it does
- Computes similarity between the reference and all other packages using one of three methods:
  - **td-idf**
  - **cosine**
  - **jaccard**
- Returns the top limit most similar packages, excluding the reference itself.
- Optionally prints results as a Markdown table.

In [11]:
sim_tf = mf.find_similar_packages("pandas", method="tf-idf", limit=2, markdown=True)

**Package:** mylist

**Description:** a python library for data manipulation and analysis.

**Keywords:** data processing, analysis, machine learning

**Popularity:** 1


---
**Package:** daction

**Description:** a python library for data manipulation and analysis.

**Keywords:** data processing, machine learning, analysis

**Popularity:** 2


---


Embedding cosine similarity

In [12]:
sim_tf = mf.find_similar_packages("pandas", method="cosine", limit=2, markdown=True)

**Package:** panndas

**Description:** a python library that provides fast, flexible, and expressive data structures designed to make working with relational or labeled data easy and intuitive.

**Keywords:** pandas, neural networks, data processing, machine learning

**Popularity:** 4


---
**Package:** liebres

**Description:** a flexible and powerful data analysis/manipulation library for python on top of sql, providing labeled data structures.

**Keywords:** data processing, sql, pandas, labeled data

**Popularity:** 2


---


Keyword Jaccard similarity

In [13]:
sim_tf = mf.find_similar_packages("pandas", method="jaccard", limit=2, markdown=True)

**Package:** superpandas

**Description:** superpandas is a powerful python library that extends the capabilities of pandas, offering enhanced functionality for data manipulation and analysis. it provides a more intuitive and efficient way to work with large datasets.

**Keywords:** pandas, data science, data analysis, data manipulation, machine learning

**Popularity:** 2


---
**Package:** Analyze

**Description:** analyze is a python module designed for comprehensive statistical analysis of dataframes, enabling data scientists, analysts, and machine learning engineers to gain significant insights from datasets with minimal code. it simplifies the process of exploring and understanding data by providing various statistical metrics and visualizations in just five lines of code.

**Keywords:** statistical analysis, dataframe, data science, data analysis, machine learning

**Popularity:** 1


---
