Skip to content
This repository has been archived by the owner on Sep 19, 2024. It is now read-only.

mongodb-developer/Airflow-MongoDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Notice: Repository Deprecation

This repository is deprecated and no longer actively maintained. It contains outdated code examples or practices that do not align with current MongoDB best practices. While the repository remains accessible for reference purposes, we strongly discourage its use in production environments. Users should be aware that this repository will not receive any further updates, bug fixes, or security patches. This code may expose you to security vulnerabilities, compatibility issues with current MongoDB versions, and potential performance problems. Any implementation based on this repository is at the user's own risk. For up-to-date resources, please refer to the MongoDB Developer Center.

Alt Text

Lets Get Started!

Step 1: Set up your environment

Make sure you have Python and pip installed on your system. Create a new directory for your project and navigate to it in your terminal.

Step 2: Install Apache Airflow

Run the following command to install Apache Airflow:

pip3 install apache-airflow

Step 3: Initialize the Airflow database

Initialize the Airflow database by running the following command:

airflow db init

Step 4: Configure Airflow

Open the Airflow configuration file airflow.cfg in a text editor. Locate the [core] section and set the executor parameter to LocalExecutor. In the [mongodb] section, configure the MongoDB connection settings, including host, port, schema, user, and password.

Step 5: Create a DAG file

In your project directory, create a new file with a .py extension, such as my_dag.py.

Open the file in a text editor and import the necessary modules:

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from pymongo import MongoClient

Define a Python function that represents your task. This function will be executed when the DAG is triggered:

def my_task():
    # Connect to MongoDB
    client = MongoClient('mongodb://your_mongodb_host:your_mongodb_port')
    db = client.your_database
    collection = db.your_collection
    
    # Perform operations on the MongoDB data
    # Example: Fetch documents from the collection
    documents = collection.find({})
    for document in documents:
        print(document)
#Create a DAG object and define its parameters:
#with DAG
(
    dag_id='my_dag',
    schedule_interval=None,  # Set the schedule as per your requirement
    start_date=datetime.datetime(2023, 6, 29),  # Set the start date
) as dag:

Define a task using the PythonOperator and pass the previously defined function as the python_callable parameter:

my_task_operator = PythonOperator(
    task_id='my_task',
    python_callable=my_task
)
Set up the dependencies between tasks, if any:
my_task_operator

Save the DAG file.

Step 6: Start the Airflow scheduler and web server

Open a new terminal window.

Run the following command to start the Airflow scheduler:

airflow scheduler

Run the following command to start the Airflow web server:

airflow webserver

Step 7: Access the Airflow UI

Open your web browser and navigate to http://localhost:8080 (or the URL specified by Airflow). You should see the Airflow UI, where you can monitor and manage your DAGs.

That's it! You've deployed Apache Airflow and configured it to use MongoDB as a data source. Your DAG (my_dag.py) will be available in the Airflow UI, and you can trigger it manually or set up a schedule according to your needs.

About

Walkthrough on how to deploy Airflow with MongoDB

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages