# Elasticsearch tutorial
The purpose of this tutorial is to provide a comprehensive and accessible introduction to the capabilities of Elasticsearch. It aims to explain the fundamental concepts and features of Elasticsearch in a clear and straightforward manner.

## Authors
Aly ABDELALEEM, Guillaume DELPORTE, Jordi HOORELBEKE

## Preamble
Before we get started with this tutorial, please make sure you have Docker installed on your system. You can download and install Docker by visiting the [official website](https://www.docker.com/).
Also, note that this tutorial will use the elasticsearch Python library. If you don't have elasticsearch installed on your machine, don't worry – it will be installed automatically at the beginning of the tutorial.

## Launching the Elastic stack inside Docker containers

The **Elastic Stack**, formerly known as the ELK Stack, is a powerful combination of open-source tools that includes **Elasticsearch**, **Logstash**, **Kibana**, and additional components like Beats and Elasticsearch's Machine Learning. It is widely used for centralized logging, log analysis, and data visualization. While this tutorial focuses on teaching the basics of Elasticsearch and does not cover the entire Elastic Stack, we will deploy the Elastic Stack using Docker containers for two main reasons:
- **Simplified Configuration**: Deploying Elasticsearch locally can involve complex configurations and dependencies. By running it inside a container, we can abstract away these configuration details, allowing us to focus solely on the basics of Elasticsearch that we aim to cover in this tutorial.
- **Preparation for Future Tutorials**: While Elasticsearch alone would be sufficient for our tutorial, deploying the entire Elastic Stack provides an opportunity for readers to familiarize themselves with the stack's deployment process. This prepares them to explore further tutorials on the Elastic Stack after gaining a solid understanding of Elasticsearch, which we recommend.

That being said, we will still use **Kibana** during this tutorial. Most specifically, the **Dev Tools**, which offers a convenient interface for querying Elasticsearch nodes. This feature will enhance our learning experience and enable us to interact effectively with Elasticsearch.

Fortunately, we won't have to handle the configuration of the Docker container manually. Instead, we will utilize the [docker-elk](https://github.com/deviantony/docker-elk) GitHub repository, maintained by deviantony. This repository provides a pre-configured setup for the ELK Stack, making it easier for us to deploy and get started with Elasticsearch.

To get started, let's clone a [forked version](https://github.com/jordi-h/docker-elk) of the GitHub repository mentioned earlier by executing the following command in your terminal:
```bash
git clone https://github.com/jordi-h/docker-elk.git
```
Once the repository is cloned, navigate to the cloned repository directory using the terminal and run the following commands:
```bash
docker-compose up setup # Initialize the Elasticsearch users and groups required by docker-elk
docker-compose up -d # If everything went well and the setup completed without error, start the other stack components
```
If everything goes well, you will have a successfully running Elastic Stack. You can access Elasticsearch at http://localhost:9200/ using the following login credentials:
- Username: elastic
- Password: changeme

Please note that it may take some time for the Kibana container to initialize. Once it is ready, you can access Kibana at http://localhost:5601/ using the same login credentials mentioned above.

To close the container and wipe the volume to start fresh use the following commands:
```bash
docker-compose down
docker volume rm docker-elk_elasticsearch
```
By following these steps, you should have the Elastic Stack up and running, ready for this tutorial.

## Let us connect to our Elasticsearch node

To query an Elasticsearch node using Python, we need to install the [elasticsearch library](https://elasticsearch-py.readthedocs.io/en/v7.17.10/) and import the Elasticsearch class, which allows you to establish a connection to an Elasticsearch cluster or node and perform various operations.

In [None]:
%pip install elasticsearch
from elasticsearch import Elasticsearch

Once the library downloaded and the Elasticsearch class usable, we may connect to the Elastisearch node we deployed inside a container earlier located at http://localhost:9200/.

In [None]:
from elasticsearch import Elasticsearch

es = Elasticsearch(
    hosts=['http://localhost:9200'],
    basic_auth=('elastic', 'changeme')  # Credentials
)

## CRUD OPERATIONS

### Creating  Indices

In [None]:
# Creating an index for ebay products
es.indices.create(index='products')

In [None]:
# Creating an index for ebay users
es.indices.create(index='users')

In [None]:
# Reteriving the created products index
es.indices.get(index="products")

In [None]:
# Reteriving the created users index
es.indices.get(index="users")

### Creating documents

In [None]:
# defining products to be added to the products index
prod_1 = {
    "product_id": 1,
    "title": "Apple iPhone 12 Pro",
    "subtitle": "256GB, Graphite, Unlocked",
    "category": {
        "level1": "Electronics",
        "level2": "Cell Phones & Accessories",
        "level3": "Cell Phones & Smartphones",
        "level4": "Apple iPhones"
    },
    "price": {
        "value": 999.99,
        "currency": "USD"
    },
    "condition": "New",
    "seller": {
        "username": "applestore",
        "feedback_score": 100,
        "feedback_count": 5000
    },
    "shipping": {
        "location": "United States",
        "service": "Free Shipping",
        "cost": {
            "value": 0,
            "currency": "USD"
        },
        "estimated_delivery": "2-5 business days"
    },
    "description": {
        "short": "The latest iPhone with advanced camera features.",
        "long": "The Apple iPhone 12 Pro features a 6.1-inch Super Retina XDR display, A14 Bionic chip, and 5G capability. The phone also has an advanced camera system with three 12MP lenses, including a telephoto lens for zooming in on distant subjects. Other features include night mode, deep fusion, and Dolby Vision HDR recording. This unlocked phone comes in a graphite finish and has 256GB of storage. Order yours today and experience the latest in iPhone technology."
    }
}
prod_2={
    "product_id": 2,
    "title": "Sony WH-1000XM4 Wireless Headphones",
    "subtitle": "Industry-leading noise cancellation.",
    "category": {
        "level1": "Electronics",
        "level2": "Portable Audio & Headphones",
        "level3": "Headphones"
    },
    "price": {
        "value": 349.99,
        "currency": "USD"
    },
    "condition": "New",
    "seller": {
        "username": "bestbuy",
        "feedback_score": 98.5,
        "feedback_count": 10000
    },
    "shipping": {
        "location": "United States",
        "service": "Free Shipping",
        "cost": {
            "value": 0,
            "currency": "USD"
        },
        "estimated_delivery": "2-5 business days"
    },
    "description": {
        "short": "Experience the best in noise-cancelling technology.",
        "long": "The Sony WH-1000XM4 wireless headphones offer industry-leading noise cancellation, as well as advanced features like touch controls, wear detection, and voice assistant compatibility. The headphones also feature 30 hours of battery life, quick charging, and high-quality sound with LDAC and DSEE Extreme. These headphones are perfect for audiophiles and frequent travelers who want to experience the ultimate in sound quality and noise cancellation. Order yours today and start enjoying your music like never before."
    }
}
prod_3 = {
    "product_id": 3,
    "title": "Samsung Galaxy S21 Ultra",
    "subtitle": "The ultimate smartphone experience.",
    "category": {
        "level1": "Cell Phones & Accessories",
        "level2": "Cell Phones & Smartphones",
        "level3": "Samsung Galaxy S21"
    },
    "price": {
        "value": 1199.99,
        "currency": "USD"
    },
    "condition": "New",
    "seller": {
        "username": "samsung_official",
        "feedback_score": 99.5,
        "feedback_count": 25000
    },
    "shipping": {
        "location": "United States",
        "service": "Free Shipping",
        "cost": {
            "value": 0,
            "currency": "USD"
        },
        "estimated_delivery": "2-5 business days"
    },
    "description": {
        "short": "Experience the pinnacle of smartphone technology.",
        "long": "The Samsung Galaxy S21 Ultra offers advanced features like a dynamic 6.8-inch AMOLED 2X display, professional-grade camera system with four lenses, and the powerful Exynos 2100 or Snapdragon 888 processor. With up to 16GB RAM and 512GB storage, it's the ultimate device for power users and mobile photographers. The Galaxy S21 Ultra also supports S-Pen input for the first time in the S series, bringing an even more versatile user experience."
    }
}

prod_4 = {
    "product_id": 4,
    "title": "Apple MacBook Pro 16\" (2022)",
    "subtitle": "Power meets pro.",
    "category": {
        "level1": "Computers/Tablets & Networking",
        "level2": "Laptops & Netbooks",
        "level3": "Apple Laptops"
    },
    "price": {
        "value": 2399.99,
        "currency": "USD"
    },
    "condition": "New",
    "seller": {
        "username": "apple_official",
        "feedback_score": 99.7,
        "feedback_count": 30000
    },
    "shipping": {
        "location": "United States",
        "service": "Free Shipping",
        "cost": {
            "value": 0,
            "currency": "USD"
        },
        "estimated_delivery": "3-7 business days"
    },
    "description": {
        "short": "Redefine what you can do with a laptop.",
        "long": "The Apple MacBook Pro 16\" (2022) is powered by the new M1 Pro or M1 Max chips, giving you extraordinary CPU, GPU, and machine learning performance. With an immersive 16-inch Liquid Retina XDR display, advanced thermal design, and up to 8TB of storage, it’s the most powerful MacBook Pro ever made. Whether you’re a creative professional or just want the best, the new MacBook Pro lets you push the limits of what is possible."
    }
}
prod_5 = {
    "product_id": 5,
    "title": "Apple iPhone 12 Pro",
    "subtitle": "5G speed. A14 Bionic. Pro camera system.",
    "category": {
        "level1": "Cell Phones & Accessories",
        "level2": "Cell Phones & Smartphones",
        "level3": "Apple iPhone"
    },
    "price": {
        "value": 999.99,
        "currency": "USD"
    },
    "condition": "New",
    "seller": {
        "username": "apple_inc",
        "feedback_score": 99.8,
        "feedback_count": 50000
    },
    "shipping": {
        "location": "United States",
        "service": "Expedited Shipping",
        "cost": {
            "value": 20,
            "currency": "USD"
        },
        "estimated_delivery": "1-3 business days"
    },
    "description": {
        "short": "Meet the new generation of iPhone - iPhone 12 Pro.",
        "long": "The Apple iPhone 12 Pro comes with a powerful A14 Bionic chip, a Pro camera system for unbelievable low-light photography, and a Ceramic Shield front cover, offering four times better drop performance. It features a 6.1-inch Super Retina XDR display, the largest ever on an iPhone, and has a surgical-grade stainless steel band. With 5G support, you can download and stream content at the highest quality. The Pro camera system takes low-light photography to the next level, with an even bigger jump on iPhone 12 Pro Max. And Ceramic Shield delivers four times better drop performance."
    }
}

# defining users to be added to the users index
user_1={
  "userId": 1,
  "username": "applestore",
  "accountType": "BUSINESS",
  "registrationMarketplaceId": "EBAY_US",
  "businessAccount": {
    "name": "Apple Inc.",
    "email": "contact@apple.com",
    "doingBusinessAs": "Apple Store",
    "primaryPhone": {
      "countryCode": "US",
      "number": "555-123-4567",
      "phoneType": "MOBILE"
    }
  },
  "address": {
    "addressLine1": "1 Infinite Loop",
    "city": "Cupertino",
    "stateOrProvince": "CA",
    "postalCode": "95014",
    "country": "US"
  },
  "primaryContact": {
    "firstName": "Tim",
    "lastName": "Cook"
  },
  "feedback_score": 100,
  "feedback_count": 5000
}

user_2={
    "userId": 2,
    "username": "bestbuy",
    "accountType": "BUSINESS",
    "registrationMarketplaceId": "AMAZON_US",
    "businessAccount": {
        "name": "Best Buy Inc.",
        "email": "info@bestbuy.com",
        "doingBusinessAs": "Best Buy",
        "primaryPhone": {
            "countryCode": "US",
            "number": "702-382-9102",
            "phoneType": "LANDLINE"
        }
    },
    "address": {
        "addressLine1": "7601 Penn Ave S",
        "addressLine2": "",
        "city": "Richfield",
        "stateOrProvince": "MN",
        "postalCode": "55423",
        "country": "US"
    },
    "primaryContact": {
        "firstName": "Pierre",
        "lastName": "Omidyar"
    },
    "feedback_score": 98.5,
    "feedback_count": 10000
}


# Adding document to the products index
res_1 = es.index(index='products', id=1, document=prod_1)
res_2 = es.index(index='products', id=2, document=prod_2)
res_3 = es.index(index='products', id=3, document=prod_3)
res_4 = es.index(index='products', id=4, document=prod_4)
res_5 = es.index(index='products', id=5, document=prod_5)

# Adding document to the users index
res_1 = es.index(index='users', id=1, document=user_1)
res_2 = es.index(index='users', id=2, document=user_2)

In [None]:
# searching for all added products
search_results = es.search(index="products", filter_path=['hits.hits._*'])

# print search results
for hit in search_results["hits"]["hits"]:
    print(hit["_source"])
    print("\n")
    
# searching for all added users
search_results = es.search(index="users", filter_path=['hits.hits._*'])

# print search results
for hit in search_results["hits"]["hits"]:
    print(hit["_source"])
    print("\n")

### Reterive a document

In [None]:
# let's get product 1 
response = es.get(index="products", id=1)
print(response)

### Update a document

In [None]:
# current product 1 price value is 999.99
response = es.get(index="products", id=1)
print("current price is {}".format(response["_source"]["price"]))

# updating product 1 price value to 800
update_body = {
        "price": {
            "value": 800,
            "currency": "USD"
        }
    }

# update the document
es.update(index="products", id=1, doc=update_body)

# Now we get product one again
response = es.get(index="products", id=1)
print("updated price is {}".format(response["_source"]["price"]))

### Delete a document

In [None]:
### let's delete user 1
response = es.delete(index="users", id=1)

In [None]:
# searching for all added users
search_results = es.search(index="users", filter_path=['hits.hits._*'])

# print search results
for hit in search_results["hits"]["hits"]:
    print(hit["_source"])
    print("\n")

## Searching Data
The goal is to use the search() function of the API to perform basic search. The search() function takes in a dictionary of search parameters and returns a dictionary of search results.


### Match Query
This query is used to match a text, a sentence or a set of words to be matched in the provided field of documents in the index. For instance, if you want to search for "Apple" in the "title" field:

In [None]:
response = es.search(
    index='products',
    query={
        "match": {
                "title": "Apple"
        }
    }
)

for hit in response["hits"]["hits"]:
    print(hit["_source"])
    print("\n")

### Match Phrase Query
This query is used when you need to find documents containing exact phrases. For instance, to find documents where the title field contains the exact phrase "Apple iPhone 12 Pro":

In [None]:
response = es.search(
    index='products',
    query={
        "match_phrase": {
            "title": "Apple iPhone 12 Pro"
        }
    }
)

for hit in response["hits"]["hits"]:
    print(hit["_source"])
    print("\n")

### Range Query
This query is used to find documents where a field's value is between certain ranges. For example, finding products with a price value between 500 and 1000 USD:

In [None]:
response = es.search(
    index='products',
    query={
        "range": {
            "price.value": {
                "gte": 500,
                "lte": 1000
            }
        }
    }
)

for hit in response["hits"]["hits"]:
    print(hit["_source"])
    print("\n")

### Boolean Query
This query is used when you want to combine multiple queries in a logical manner. For example, to find products that have "Apple" in their title and a price value less than 1000 USD:

In [None]:
response = es.search(
    index='products',
    query={
        "bool": {
            "must": [
                {"match": {"title": "Apple"}},
                {"range": {"price.value": {"lt": 900}}}
            ]
        }
    }
)

for hit in response["hits"]["hits"]:
    print(hit["_source"])
    print("\n")

We can see that the only iPhone showing up is the one priced at 800 USD.

### Full-Text Search:
This query is used when you want to search for a specific term in all the text fields:

In [None]:
response = es.search(
    index='products',
    query={
        "multi_match": {
            "query": "Apple",
            "fields": ["title", "description.short", "description.long"]
        }
    }
)

for hit in response["hits"]["hits"]:
    print(hit["_source"])
    print("\n")

### Fuzzy Search
This query is used when you want to search for similar words. 
Fuzzy search is a technique used to find similar words or terms when the exact spelling is uncertain. For example, if you want to search for "Apple" but a typo is made, the node will still find the actual index.

In [None]:
# Fuzzy Query
response = es.search(
    index='products',
    query={
        "fuzzy": {
            "title": {
                "value": "Apole",
                "fuzziness": 2
            }
        }
    }
)

for hit in response["hits"]["hits"]:
    print(hit["_source"])
    print("\n")

We can see that even if 'Apple' was misspelled, the fuzzy search was able to find it. The fuzziness parameter is used to specify the maximum number of characters that can differ between the search term and the matching terms.
It has to be chosen carefully because a high value can lead to a lot of irrelevant results and long search times.

## Going Further
Throughout this tutorial, we covered essential aspects of Elasticsearch, including index creation, CRUD operations, and some search capabilities. However, it's important to note that what we've explored so far represents only a glimpse of Elasticsearch's capabilities. There is much more to discover and explore.

The features we discussed are just a starting point, and Elasticsearch offers a vast array of advanced functionalities and tools. By delving deeper into Elasticsearch, you can unlock its full potential and leverage its advanced search capabilities, scalability, performance optimization techniques, and powerful analytics capabilities.

While this tutorial provides a solid foundation and practical insights into Elasticsearch, we highly recommend continuing your learning journey. There are numerous resources available.

Here are some recommended sources to dive deeper into Elasticsearch:
- Official Elasticsearch Documentation: The official documentation provides comprehensive information on Elasticsearch's features, concepts, APIs, and usage. It covers various topics, from basic setup to advanced search techniques and cluster management. You can find it at https://www.elastic.co/guide/index.html.
- For a quick and accessible tutorial, we recommend checking out the Elasticsearch tutorial available at [Tutorialspoint](https://www.tutorialspoint.com/elasticsearch/index.htm). This tutorial provides a user-friendly approach to learning Elasticsearch, offering step-by-step explanations and examples.
- We highly recommend following the tutorial series created by LisaHJung, a Senior Developer Advocate at Elastic, which is available at https://github.com/LisaHJung/Beginners-Crash-Course-to-Elastic-Stack-Series-Table-of-Contents. This comprehensive tutorial is an excellent learning resource, especially for beginners who want to explore the Elastic Stack and fully grasp the capabilities of Elasticsearch.

## Resources
[elastic.co](https://www.elastic.co/)

[Elastic Stack (ELK Stack)](https://www.techtarget.com/searchitoperations/definition/Elastic-Stack)

[deviantony/docker-elk GitHub Repository](https://github.com/deviantony/docker-elk)

[Run Elastic stack (ELK) on Docker Containers with Docker Compose](https://computingforgeeks.com/run-elastic-stack-elk-on-docker/)

[Elasticsearch API Documentation](https://elasticsearch-py.readthedocs.io/en/7.x/api.html#elasticsearch)