This repository is deprecated and no longer actively maintained. It contains outdated code examples or practices that do not align with current MongoDB best practices. While the repository remains accessible for reference purposes, we strongly discourage its use in production environments. Users should be aware that this repository will not receive any further updates, bug fixes, or security patches. This code may expose you to security vulnerabilities, compatibility issues with current MongoDB versions, and potential performance problems. Any implementation based on this repository is at the user's own risk. For up-to-date resources, please refer to the MongoDB Developer Center.
Lets Get Started!
Step 1: Set up your environment
Make sure you have Python and pip installed on your system. Create a new directory for your project and navigate to it in your terminal.
Step 2: Install Apache Airflow
Run the following command to install Apache Airflow:
pip3 install apache-airflow
Step 3: Initialize the Airflow database
Initialize the Airflow database by running the following command:
airflow db init
Step 4: Configure Airflow
Open the Airflow configuration file airflow.cfg in a text editor. Locate the [core] section and set the executor parameter to LocalExecutor. In the [mongodb] section, configure the MongoDB connection settings, including host, port, schema, user, and password.
Step 5: Create a DAG file
In your project directory, create a new file with a .py extension, such as my_dag.py.
Open the file in a text editor and import the necessary modules:
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from pymongo import MongoClient
Define a Python function that represents your task. This function will be executed when the DAG is triggered:
def my_task():
# Connect to MongoDB
client = MongoClient('mongodb://your_mongodb_host:your_mongodb_port')
db = client.your_database
collection = db.your_collection
# Perform operations on the MongoDB data
# Example: Fetch documents from the collection
documents = collection.find({})
for document in documents:
print(document)
#Create a DAG object and define its parameters:
#with DAG
(
dag_id='my_dag',
schedule_interval=None, # Set the schedule as per your requirement
start_date=datetime.datetime(2023, 6, 29), # Set the start date
) as dag:
Define a task using the PythonOperator and pass the previously defined function as the python_callable parameter:
my_task_operator = PythonOperator(
task_id='my_task',
python_callable=my_task
)
Set up the dependencies between tasks, if any:
my_task_operator
Save the DAG file.
Step 6: Start the Airflow scheduler and web server
Open a new terminal window.
Run the following command to start the Airflow scheduler:
airflow scheduler
Run the following command to start the Airflow web server:
airflow webserver
Step 7: Access the Airflow UI
Open your web browser and navigate to http://localhost:8080 (or the URL specified by Airflow). You should see the Airflow UI, where you can monitor and manage your DAGs.
That's it! You've deployed Apache Airflow and configured it to use MongoDB as a data source. Your DAG (my_dag.py) will be available in the Airflow UI, and you can trigger it manually or set up a schedule according to your needs.

