Schema-free Document Database

Schema-free document database using open-source FerretDB, "an open-source proxy, converting the MongoDB 6.0+ wire protocol queries to SQL - using PostgreSQL or SQLite as a database engine."

FerretDB is pitched as the "de-facto open-source substitute to MongoDB." Unfortunately, FerretDB does not (yet) support nested JSON arrays, which are one of the best parts of a MongoDB document database. However, another great aspect of MongoDB databases is the ease of dumping documents with different schemas into a collection. Plus, FerretDB's service supports requests sent in a Python script using the pymongo library.

Here's an example of how data from a CSV file can be parsed, manipulated, and inserted into the document database.

Input

id	user_name	post	post_time
1	FionaA	"Hello world!"	2023-07-21 08:14:44.826259

Output

{
  "_id": {
    "$oid": "64ba4cc917022a377d337a25"
  },
  "id": "1",
  "user_name": "FionaA",
  "post": "Hello world!",
  "post_time": "2023-07-21 08:14:44.826259",
  "words": ["Hello", "world!"]
}

Contents

Set up
Usage
Graphic User Interface (GUI)

Set up

Set up database

Use Docker to install FerretDB.

$ docker pull ghcr.io/ferretdb/ferretdb:1.6.0

Start the FerretDB service in the background.

$ docker compose up -d.

To verify that the service is running as a daemon with the following command, replace <DATABASE> with the name of a database on the FerretDB cluster. This command will open a shell based on mongosh, in which you can interact with the target database.

$ docker run --rm -it --network=ferretdb --entrypoint=mongosh mongo   "mongodb://username:password@ferretdb/<DATABASENAME>?authMechanism=PLAIN"

To explore the FerretDB shell's commands, enter:

DATABASENAME> db.help()

To exit the shell, enter: Ctrl+C twice.

To stop the FerretDB service, enter:

$ docker compose down

Set up Python package

Create a virtual environment in Python 3.11.

Install dependencies.

$ pip install --r requirements.txt

Usage

Import a CSV file into the document database.

$ python src/main.py --database-name <DATABASE> --collection-name <COLLECTION> import example.data.csv

Between parsing a CSV row into a dictionary and inserting it into the document database, you can manipulate the data, adding fields and enrichments. There are no limitations on how many new key-value pairs can be added to a row because MongoDB is flexible and does not require a rigid schema. However, due to FerretDB's limitations, no key's value can be a nested JSON array.

    # Stream the CSV file
    for data in yield_data(datafile=datafile):
        # ---------- #
        # Do things to the CSV dict row...
        data.update({"words": data["post"].split()})
        # ---------- #

        # Dump the data in the collection
        collection.insert_one(document=data)

Export the collection to a JSON file.

$ python src/main.py --database-name <DATABASE> --collection-name <COLLECTION> export

Graphic User Interface

The best open-source option for visually exploring the FerretDB / MongoDB database contents is an extension in VSCode for MongDB, which has an Apache 2.0 license. To connect your database to VSCode, the database service must first be running.

Open VSCode.
Install the MongDB for VSCode extension.
At the top of the screen, click View > Command Palette > MongoDB Connect with String Connection.
Input the connection string URI: "mongodb://username:password@127.0.0.1/ferretdb?authMechanism=PLAIN"
Make sure the Activity Bar is visible in VSCode. (View > Appearance > Activity Bar)
On the Activity Bar, scroll and find the MongoDB icon (an upright leaf), click it, and navigate your FerretDB / MongoDB connections and their databases.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.json		config.json
example.data.csv		example.data.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

config.json

config.json

example.data.csv

example.data.csv

requirements.txt

requirements.txt

Repository files navigation

Schema-free Document Database

Set up

Set up database

Set up Python package

Usage

Graphic User Interface

About

Releases

Packages

Languages

License

medialab/schema-free-doc-db

Folders and files

Latest commit

History

Repository files navigation

Schema-free Document Database

Set up

Set up database

Set up Python package

Usage

Graphic User Interface

About

Resources

License

Stars

Watchers

Forks

Languages