# 101 Hello, Weaviate
https://weaviate.io/developers/academy/zero_to_mvp/hello_weaviate
## Unit overview

In [1]:
from IPython.display import HTML
HTML('<iframe width="640" height="360" src="https://www.youtube.com/embed/FU7l5pr2FmU" allowfullscreen></iframe>')



This course is designed to get you started with Weaviate, so that you can go from being new to Weaviate to building an MVP-level product with Weaviate in a short period of time.

Along the way, you'll develop intuitions about not only how Weaviate works, but also how vectors work, and how vector searches work. You'll also learn how to use Weaviate's client library so that you can get going in a language that you are familiar with.

By the time you're done with these short units, you'll be able to build your own instance of Weaviate with your own data, and have a suite of search tools at your disposal so that you can get the data you want in the format you want it.

## Learning objectives
Here, we will cover:
- What Weaviate is, and what it does.
- How to create your own Weaviate instance on WCS.
- Weaviate clients and how to install them.
- Hands-on experience with Weaviate.

By the time you are finished, you will be able to:
- Broadly describe what Weaviate is.
- Outline what vector search is.
- Create a Weaviate instance on WCS.
- Install your preferred Weaviate client.
- Describe some of Weaviate's capabilities.

# Introduction to Weaviate
https://weaviate.io/developers/academy/zero_to_mvp/hello_weaviate/intro_weaviate
## What is Weaviate?
Weaviate is an open-source vector database. But what does that mean? Let's unpack it here.

In [2]:
HTML('<iframe width="640" height="360" src="https://www.youtube.com/embed/MQgm126pKkU" allowfullscreen></iframe>')

### Vector database
Weaviate is a fantastic tool for retrieving the information you need, quickly and accurately. It does this by being an amazing **vector database**.

You may be familiar with traditional databases such as relational databases that use SQL. A database can catalog, store and retrieve information. A **vector** database can carry out these tasks also, with the key difference being that they can perform these tasks based on similarity.

#### How traditional searches work
Imagine that you are searching a relational database containing articles on cities, to retrieve a list of "major" European cities. Using SQL, you might construct a query like this:

```sql
SELECT city_name wiki_summary
FROM wiki_city
WHERE (wiki_summary LIKE '%major European city%' OR
       wiki_summary LIKE '%important European city%' OR
       wiki_summary LIKE '%prominent European city%' OR
       wiki_summary LIKE '%leading European city%' OR
       wiki_summary LIKE '%significant European city%' OR
       wiki_summary LIKE '%top European city%' OR
       wiki_summary LIKE '%influential European city%' OR
       wiki_summary LIKE '%notable European city%')
    (… and so on)
```
Which would return cities that contained any of these strings (`major`, `important`, `prominent`, ... etc) in the `wiki_summary` column.

This works well in many circumstances. However, there are two significant limitations with this approach.

#### Limitations of traditional search
Using this type of search requires you to identify terms that may have been used to describe the concept, which is no easy feat.

What's more, this doesn't solve the problem of how to rank the list of resulting objects.

With the above search query, an entry merely containing a mention of a different European city (i.e. not very relevant) would be given equal weighting to an entry for Paris, or Rome, which would be highly relevant.

A vector database makes this job simpler by enabling searches based on similarity.

#### Examples of vector search
So, you could perform a query like this in Weaviate:

```
{
  Get {
    WikiCity (
      nearText: { concepts: ["Major European city"] }
    ) { city_name wiki_summary }
  }
}
```

And it would return a list of entries that are ranked by their similarity to the query - the idea of "Major European city".

What's more, Weaviate "indexes" the data based on their similarity, making this type of data retrieval lightning-fast.

Weaviate can help you to do all this, and actually a lot more. Another way to think about Weaviate is that it supercharges the way you use information.

> VECTOR VS SEMANTIC SEARCH<br>
A vector search is also referred to as a "semantic search" because it returns results based on the similarity of meaning (therefore "semantic").

### Open-source
Weaviate is open-source. In other words, its [codebase is available online](https://github.com/weaviate/weaviate) for anyone to see and use [$^{\rm{1}}$](https://weaviate.io/developers/academy/zero_to_mvp/hello_weaviate/intro_weaviate#1).

And that is the codebase, regardless of how you use it. So whether you run Weaviate on your own computer, on a cloud computing environment, or through our managed service [Weaviate Cloud Services, or WCS](https://console.weaviate.io/), you are using the exact same technology.

So, if you want, you can run Weaviate for free on your own device, or use our managed service for convenience. You can also take comfort in that you can see exactly what you are running, and be a part of the open-source community, as well as to shape its development.

It also means that your knowledge about Weaviate is fungible, between local, cloud, or managed instances of Weaviate. So anything you learn here about Weaviate using WCS will be equally applicable to running it locally, and vice versa. 😉

### Information, made dynamic
We are used to thinking of information as static, like a book. But with Weaviate and modern AI-driven language models, we can do much more than just retrieve static information but easily build on top of it. Take a look at these examples:

#### Question answering
Given a list of Wikipedia entries, you could ask Weaviate:
> When was Lewis Hamilton born?

And it would answer with:
> Lewis Hamilton was born on January 7, 1985. ([check for yourself](https://en.wikipedia.org/wiki/Lewis_Hamilton))

The according query:
```
{
  Get {
    WikiArticle (
      ask: {
        question: "When was Lewis Hamilton born?",
        properties: ["wiki_summary"]
      },
      limit: 1
    ) {
      title
      _additional {
        answer {
          result
        }
      }
    }
  }
}
```
The according response:
```
{
  "data": {
    "Get": {
      "WikiArticle": [
        {
          "_additional": {
            "answer": {
              "result": " Lewis Hamilton was born on January 7, 1985."
            }
          },
          "title": "Lewis Hamilton"
        }
      ]
    }
  }
}
```

#### Generative search
Or you can synthesize passages using retrieved information with Weaviate:

Here is one, where we searched Weaviate for an entry on a "racing driver", and produce the result in the format of:
> Write a fun tweet encouraging people to read about this: ## {title} by summarizing highlights from: ## {wiki_summary}

Which produces:
> Check out the amazing story of Lewis Hamilton, the 7-time Formula One World Drivers' Championship winner! From his humble beginnings to becoming one of the world's most influential people, his journey is an inspiring one. #LewisHamilton #FormulaOne #Motorsport #Racing

The according query:
```
{
  Get {
    WikiArticle(
      nearText: {
        concepts: ["Racing Driver"]
      }
      limit: 1
    ) {
      title
      wiki_summary
      _additional {
        generate(
          singleResult: {
            prompt: """
              Write a fun tweet encouraging people to read about this: ## {title}
              by summarizing highlights from: ## {wiki_summary}
            """
          }
        ) {
          singleResult
          error
        }
      }
    }
  }
}
```
The according response:
```
{
  "data": {
    "Get": {
      "WikiArticle": [
        {
          "_additional": {
            "generate": {
              "error": null,
              "singleResult": "Check out the amazing story of Lewis Hamilton, the 7-time Formula One World Drivers' Championship winner! From his humble beginnings to becoming a global icon, his journey is an inspiring one. #LewisHamilton #FormulaOne #Motorsport #Racing #Inspiration"
            }
          },
          "title": "Lewis Hamilton",
          "wiki_summary": "Sir Lewis Carl Davidson Hamilton   (born 7 January 1985) is a British racing driver currently competing in Formula One, driving for Mercedes-AMG Petronas Formula One Team. In Formula One, Hamilton has won a joint-record seven World Drivers' Championship titles (tied with Michael Schumacher), and holds the records for the most wins (103), pole positions (103), and podium finishes (191), among others.\nBorn and raised in Stevenage, Hertfordshire, Hamilton joined the McLaren young driver programme in 1998 at the age of 13, becoming the youngest racing driver ever to be contracted by a Formula One team. This led to a Formula One drive with McLaren for six years from 2007 to 2012, making Hamilton the first black driver to race in the series. In his inaugural season, Hamilton set numerous records as he finished runner-up to Kimi R\u00e4ikk\u00f6nen by one point. The following season, he won his maiden title in dramatic fashion\u2014making a crucial overtake at the last corner on the last lap of the last race of the season\u2014to become the then-youngest Formula One World Champion in history.  After six years with McLaren, Hamilton signed with Mercedes in 2013.\nChanges to the regulations for 2014 mandating the use of turbo-hybrid engines saw the start of a highly successful period for Hamilton, during which he won six further drivers' titles. Consecutive titles came in 2014 and 2015 during an intense rivalry with teammate Nico Rosberg. Following Rosberg's retirement in 2016, Ferrari's Sebastian Vettel became Hamilton's closest rival in two championship battles, in which Hamilton twice overturned mid-season point deficits to claim consecutive titles again in 2017 and 2018. His third and fourth consecutive titles followed in 2019 and 2020 to equal Schumacher's record of seven drivers' titles. Hamilton achieved his 100th pole position and race win during the 2021 season. \nHamilton has been credited with furthering Formula One's global following by appealing to a broader audience outside the sport, in part due to his high-profile lifestyle, environmental and social activism, and exploits in music and fashion. He has also become a prominent advocate in support of activism to combat racism and push for increased diversity in motorsport. Hamilton was the highest-paid Formula One driver from 2013 to 2021, and was ranked as one of the world's highest-paid athletes by Forbes of twenty-tens decade and 2021. He was also listed in the 2020 issue of Time as one of the 100 most influential people globally, and was knighted in the 2021 New Year Honours. Hamilton was granted honorary Brazilian citizenship in 2022.\n\n"
        }
      ]
    }
  }
}
```

We will cover these and many more capabilities, such as vectorization, summarization and classification, in our units.

For now, keep in mind that Weaviate is a vector database at its core which can also leverage AI tools to do more with the retrieved information.

## Review
In this section, you learned about what Weaviate is and how it works at a very high level. You have also been introduced to what vector search is at a high level, that it is a similarity-based search method.

### Review exercises
What is the difference in the Weaviate codebase between local and cloud deployments?

$\times$ Cloud deployments always include additional modules.<br>
$\times$ Local deployments are optimized for GPU use.<br>
$\times$ Cloud deployments are optimized for scalability.<br>
$\checkmark$ None, they are the same.

What is the best description of vector search?

$\times$ Vector search is a directional search.<br>
$\checkmark$ Vector search is a similarity-based search<br>
$\times$ Vector search is a number-based search.

### Key takeaways
- Weaviate is an open source vector database.
- The core Weaviate library is the same whether you run it locally, on the cloud, or with WCS.
- Vector searches are similarity-based searches.
- Weaviate can also transform your data after retrieving it before returning it to you.

# Vectors - An overview
https://weaviate.io/developers/academy/zero_to_mvp/hello_weaviate/overview_vectors
## What is a vector?

In [3]:
HTML('<iframe width="640" height="360" src="https://www.youtube.com/embed/iFUeV3aYynI" allowfullscreen></iframe>')

We've covered that Weaviate is a vector database, and that a vector search is similarity-based. But what is a vector?

A vector in this context is just a series of numbers - like `[1, 0]` or `[0.513, 0.155, 0.983, ..., 0.001, 0.932]`. Vectors like these are used to capture meaning as a series of numbers.

This might seem like an odd concept. But in fact, many people have used vectors already without realizing - for example if they have tried photo editing, or MS Paint.

### How do numbers represent meaning?
The RGB system use numbers to represent colors. For example:
- `(255, 0, 0)` = red
- `(80, 200, 120)` = emerald

In these examples, each number can be thought of as a dial for how red, green or blue a color is.

Now, imagine having hundreds, or even thousands of these dials. That’s how vectors are used to represent meaning. Modern models such as GPT-x, or those used with Weaviate use vectors in this manner to represent some "essence", or "meaning" of objects. And this can be done for any object type, such as text, code, images, videos and more.

Each vector representation of such "meaning" is called a vector embedding.

## Vector embeddings in Weaviate
Weaviate enables vector searches by indexing and storing data objects and corresponding vector embeddings from machine learning models.

In plain terms, Weaviate processes and organizes your data in such a way that objects can be retrieved based on their similarity to a query. In order for it to perform these tasks at speed, Weaviate does two things that traditional databases do not. They are:
- Quantifying similarity, and
- Indexing vector data

These aspects enable Weaviate to do what it does.

### Quantifying similarity
As we've mentioned, vector searches are similarity-based, but what does that actually mean? How do we determine that two pieces of data are "similar"? What does it mean for two pieces of text, two images, or two objects in general, to be similar?

This is a relatively simple idea that is actually incredibly interesting and intricate once we start to dive into the details.

But for now, you should know that machine learning (ML) models are key to this whole process. Similar models to those that allows clever text generation from prompts power vector searches. Instead of generating new text, here these models capture "meaning" of pieces of text or other media. We will cover this in more detail later on.

### Indexing (vector) data
Vector searches can be very computationally intensive.

To overcome this problem, Weaviate uses a combination of indexes including an approximate nearest neighbor (ANN) index and an inverted index. They respectively allow Weaviate to perform extremely fast vector searches, as well as to filter data using Boolean criteria on data.

We will get into this in more detail later - but for now, it's enough to know that Weaviate can perform fast vector searches as well as filtering.

## Review
In this section, you learned about what vectors are and how Weaviate utilizes them at a very high level. You have also been introduced to Weaviate's two key capabilities that helps it to enable vector search at speed.

### Review exercise
> Can you describe, in your own words, what vectors are?<br><br>My answer:<br>A vector of dimension $n$ is a list of $n$ numbers and thus can be interpreted as a dot in $n$-dimensional space. In the context of artificial intelligence, vectors have many use cases, among them the representation of *embeddings*, i.e., a mathematical represnetation of tokens (sub-words). Creating "good" embeddings – that is embeddings that relate to each other like the sub-words they represent (e.g., "king" should relate to "queen" like "man" to "woman") – for a set of tokens is paramount to natural language processing and other areas of machine learning. 

Which of these statements are true?

$\times$ Vector search is a directional search.<br>
$\checkmark$ Vector search is a similarity-based search<br>
$\times$ Vector search is a number-based search.

### Key takeaways
A vector is a series of numbers that capture the meaning or essence of objects.
Machine learning models help quantify similarity between different objects, which is essential for vector searches.
Weaviate uses a combination of an approximate nearest neighbor (ANN) index and an inverted index to perform fast vector searches with filtering.

# Examples 1 - Queries
https://weaviate.io/developers/academy/zero_to_mvp/hello_weaviate/examples_1
## Vectors in action

In [4]:
HTML('<iframe width="640" height="360" src="https://www.youtube.com/embed/zC0CpBiLC3g" allowfullscreen></iframe>')

Let's take a look at a few more examples of what you can do with Weaviate.

First, we will try vector searches by searching through our demo database. You will learn how to use Weaviate to retrieve objects based on their similarity, using various query types such as an input text, vector, or object.

You will also compare vector search with keyword search to compare and contrast the two techniques, before learning how to combine the two techniques through the use of filters.

## Vector search demo
For our first example, let's look through our demo dataset which contains a small sample of questions from the quiz show Jeopardy!.

Imagine that you're running a quiz night, and want to get some questions around the category of "animals in movies". You could look for word matches - perhaps something like:

```sql
SELECT question, answer
FROM jeopardy_questions
WHERE (question LIKE '%animal%' OR question LIKE '%creature%' OR question LIKE '%beast%')
AND (question LIKE '%movie%' OR question LIKE '%film%' OR question LIKE '%picture%' OR question LIKE '%cinema%')
```

But this is very difficult. You likely need to know the names of the specific animals to carry this out.

Not so much with Weaviate, though. See what happens when we run the following query:

> WE SEARCHED WEAVIATE FOR:<br>animals in movies

See the full query:
```
{
  Get {
    JeopardyQuestion (
      nearText: {
        concepts: ["animals in movies"]
      }
      limit: 3
    ) {
      question
      answer
    }
  }
}
```
Weaviate retrieved these as the top answers:
> **meerkats**: Group of mammals seen here like Timon in *The Lion King*<br>**dogs**: Scooby-Doo, Goofy & Pluto are cartoon versions<br>**The Call of the Wild Thornberrys**: Jack London story about the dog Buck who joins a Nick cartoon about Eliza, who can talk to animals

JSON response:<br>
```
{
  "data": {
    "Get": {
      "JeopardyQuestion": [
        {
          "answer": "meerkats",
          "question": "Group of mammals seen <a href=\"http://www.j-archive.com/media/1998-06-01_J_28.jpg\" target=\"_blank\">here</a>:  [like Timon in <i>The Lion King</i>]"
        },
        {
          "answer": "dogs",
          "question": "Scooby-Doo, Goofy & Pluto are cartoon versions"
        },
        {
          "answer": "The Call of the Wild Thornberrys",
          "question": "Jack London story about the dog Buck who joins a Nick cartoon about Eliza, who can talk to animals"
        }
      ]
    }
  }
}
```

Note just how relevant the results are, despite none of them including the word "animal" or the word "movie", let alone both!

This is exactly why vector searches are so useful. They can identify related objects without the need to match exact texts.

### Vector similarities demo
What if we run *this* query? What will we get back?
```
{
  Get {
    JeopardyQuestion (
      nearText: {
        concepts: ["European geography"]
      }
      limit: 3
    ) {
      question
      answer
      _additional {
        distance
      }
    }
  }
}
```
Take a look at this response. Do you notice any additional information?
```
{
  "data": {
    "Get": {
      "JeopardyQuestion": [
        {
          "_additional": {
            "distance": 0.15916324
          },
          "answer": "Bulgaria",
          "question": "A European republic: Sofia"
        },
        ...
      ]
    }
  }
}
```
JSON response:
```
{
  "data": {
    "Get": {
      "JeopardyQuestion": [
        {
          "_additional": {
            "distance": 0.15916324
          },
          "answer": "Bulgaria",
          "question": "A European republic: Sofia"
        },
        {
          "_additional": {
            "distance": 0.16247147
          },
          "answer": "Balkan Peninsula",
          "question": "The European part of Turkey lies entirely on this peninsula"
        },
        {
          "_additional": {
            "distance": 0.16832423
          },
          "answer": "Mediterranean Sea",
          "question": "It's the only body of water with shores on the continents of Asia, Africa & Europe"
        }
      ]
    }
  }
}
```
The difference is that the response now contains a `distance` value.

A `distance` is indicative of the degree of similarity between the returned object and the query.

If you're wondering exactly what that means, and who decides how similar any two objects or concepts are, those are great questions! We will cover those in more detail later.

For now, just keep in mind that smaller distances mean two objects are more similar to each other.

## Review
### Key takeaways
- Vector searches can be very effective as they can identify related objects without the need for exact text matches.
- When using vector searches, distance values indicate the degree of similarity between the returned object and the query.
- Smaller distances indicate greater similarity.
- Vector searches can be combined with keyword searches and filtering techniques for more refined search results.

# Examples 2 - More than search
https://weaviate.io/developers/academy/zero_to_mvp/hello_weaviate/examples_2
## Beyond vector searches

In [5]:
HTML('<iframe width="640" height="360" src="https://www.youtube.com/embed/ezMZKEFteUA" allowfullscreen></iframe>')

You can do a lot more with Weaviate than to simply retrieve static information.

Let's take a look at a couple of examples, where we do more than simply retrieve objects from the database.

We'll use this Wiki entry from which we will extract information:

>"The Sydney Opera House" Wikipedia summary:<br><br>
> The Sydney Opera House is a multi-venue performing arts centre in Sydney. Located on the foreshore of Sydney Harbour, it is widely regarded as one of the world's most famous and distinctive buildings and a masterpiece of 20th-century architecture. Designed by Danish architect Jørn Utzon, but completed by an Australian architectural team headed by Peter Hall, the building was formally opened by Queen Elizabeth II on 20 October 1973 after a gestation beginning with Utzon's 1957 selection as winner of an international design competition. The Government of New South Wales, led by the premier, Joseph Cahill, authorised work to begin in 1958 with Utzon directing construction. The government's decision to build Utzon's design is often overshadowed by circumstances that followed, including cost and scheduling overruns as well as the architect's ultimate resignation. The building and its surrounds occupy the whole of Bennelong Point on Sydney Harbour, between Sydney Cove and Farm Cove, adjacent to the Sydney central business district and the Royal Botanic Gardens, and near to the Sydney Harbour Bridge.<br><br>The building comprises multiple performance venues, which together host well over 1,500 performances annually, attended by more than 1.2 million people. Performances are presented by numerous performing artists, including three resident companies: Opera Australia, the Sydney Theatre Company and the Sydney Symphony Orchestra. As one of the most popular visitor attractions in Australia, the site is visited by more than eight million people annually, and approximately 350,000 visitors take a guided tour of the building each year. The building is managed by the Sydney Opera House Trust, an agency of the New South Wales State Government.<br><br>On 28 June 2007, the Sydney Opera House became a UNESCO World Heritage Site, having been listed on the (now defunct) Register of the National Estate since 1980, the National Trust of Australia register since 1983, the City of Sydney Heritage Inventory since 2000, the New South Wales State Heritage Register since 2003, and the Australian National Heritage List since 2005. The Opera House was also a finalist in the New7Wonders of the World campaign list.

### Question-answering demo
For one, Weaviate can extract knowledge from text.
```
{
  Get {
    WikiArticle (
      ask: {
        question: "When did construction for the Sydney Opera House start?",
        properties: ["wiki_summary"]
      },
      limit: 1
    ) {
      title
      _additional {
        answer {
          hasAnswer
          property
          result
          startPosition
          endPosition
        }
      }
    }
  }
}
```
Given this question, Weaviate can not only identify the most relevant object, but also the actual answer, based on the textual information above!
> WEAVIATE SAYS:<br>Construction for the Sydney Opera House started in 1958.

### Generative search
Weaviate can do even more with these entries, too. You can ask it to grab an object from its store, like the Wikipedia entry above for the Sydney Opera House, and generate derivative text.
```
{
  Get {
    WikiArticle(
      nearText: {
        concepts: ["Sydney Opera House"]
      }
      limit: 1
    ) {
      title
      wiki_summary
      _additional {
        generate(
          singleResult: {
            prompt: """
              Write a fun tweet encouraging people to read about this: ## {title}
              by summarizing highlights from: ## {wiki_summary}
            """
          }
        ) {
          singleResult
          error
        }
      }
    }
  }
}
```
For example, the above will generate a Tweet based on the Wikipedia entry!
> WEAVIATE SAYS:<br>Explore the world-famous Sydney Opera House and its incredible architecture! From the iconic design to the amazing performances, there's something for everyone to enjoy. #SydneyOperaHouse #Explore #Architecture #Performances #Experience

This is an example of a `generative search`, where Weaviate has retrieved information, and then leveraged a large language model (LLM) to re-shape it. This is a powerful feature that can transform how you deal with information.

You can vary the prompt to build any number of other outputs from this input.

### What next?
Tools like Q&A and generative search really start to bring your information to life. In the next sections, we'll get you set up to use Weaviate, and have you performing queries before we wrap up this unit.

## Review
### Key takeaways
- Weaviate can extract knowledge from text using question-answering capabilities, identifying the most relevant object and the actual answer based on the provided text.
- Generative search allows you to retrieve information and reshape or repurpose the content, such as generating a tweet based on a Wikipedia entry.
- These advanced capabilities of Weaviate transform how you interact with and utilize information in your data.

# Database & client setup
https://weaviate.io/developers/academy/zero_to_mvp/hello_weaviate/set_up

## Overview

In [6]:
HTML('<iframe width="640" height="360" src="https://www.youtube.com/embed/PgHCLcqfe10" allowfullscreen></iframe>')

## Options for running Weaviate
You can run Weaviate as a managed service (Weaviate Cloud Services, or WCS), or manage your own instances using Docker, Kubernetes, or Embedded Weaviate.

As mentioned before, the underlying Weaviate code is identical regardless of the method. There are some differences to be aware of.

### WCS (Recommended)
[WCS, or Weaviate Cloud Services](https://console.weaviate.cloud/), is a managed SaaS service that requires no maintenance at your end.

As it is managed by Weaviate (the company - the software is not sentient... *yet*), it is the fastest way to create a new instance of Weaviate and requires the least amount of effort for the users.

Weaviate instances on WCS come pre-configured for convenience. They include a number of Weaviate "modules" by default, as well as built-in support for user authentication for security.

WCS includes a free "sandbox" tier, which is our recommended method of running Weaviate throughout this course unless otherwise specified.

### Docker / Kubernetes
You can run Weaviate instances using containerization solutions such as [Docker](https://docs.docker.com/) and/or [Kubernetes](https://kubernetes.io/docs/home/).

This provides you with the same code base as one that is used in WCS, although you will need to manage its configuration and deployment yourself.

Self-managed set ups using Docker or Kubernetes will be discussed in standalone, separate units in the future.

If you are familiar with either solutions and would prefer to install Weaviate through them, we refer you to our documentation for installation with [Docker-Compose](https://weaviate.io/developers/weaviate/installation/docker-compose) or [Kubernetes](https://weaviate.io/developers/weaviate/installation/kubernetes).

### Embedded Weaviate
We also have an experimental feature called [Embedded Weaviate](https://weaviate.io/developers/weaviate/installation/embedded), where you can directly instantiate a Weaviate database from a client library.

It is however in an experimental phase only, and is not recommended for anything other than evaluation.

## Get started with WCS
### Sign in to WCS
1. First, access WCS by navigating to the [Weaviate Cloud Console](https://console.weaviate.cloud/), and click on "Sign in with the Weaviate Cloud Services".<br>If you don't have an account with WCS yet, click on the "Register" button and create a new account.
1. Then, sign in with your WCS username and password.

### Create a Weaviate Cluster
To create a new Weaviate Cluster, click the "Create cluster" button.

<img width="95%" src="images/image_1.png">

Then:
1. Select the **Free sandbox** *plan tier*.
1. Provide a *cluster name*, which will become a part of its URL. A suffix will be added to this to ensure uniqueness.
1. Set the `Enable Authentication?` option to `YES`.

Your selections should look like this:

<img width="95%" src="images/image_2.png">

Finally, press *Create* to create your sandbox instance. Note that the sandbox will expire after a set number of days.

<img width="95%" src="images/image_3.png">

Finally, press *Create* to create your sandbox instance. Note that the sandbox will expire after a set number of days.

<img width="45%" src="images/image_4.png">

Instance creation should take a minute or two, and you will see a tick $\checkmark$ when it's done, indicating that the instance is ready.

> SANDBOX EXPIRY<br>
The sandbox is free, but it will expire after 14 days. After this time, all data in the sandbox will be deleted.<br>If you would like to preserve your sandbox data, you can retrieve your data, or contact us to upgrade to a production SaaS instance.

You can communicate with Weaviate with the available [client libraries](https://weaviate.io/developers/weaviate/client-libraries) or directly with the [RESTful API](https://weaviate.io/developers/weaviate/api/rest) and the [GraphQL API](https://weaviate.io/developers/weaviate/api/graphql).

## Install Weaviate client
> For the initial release of Weaviate Academy units, our materials are written around Python examples.<br><br>But we are working to add versions for other clients, starting with TypeScript/JavaScript examples. We appreciate your patience as we build up our educational material.

### Available clients
Currently, Weaviate clients are available in:
- Python
- TypeScript/JavaScript
- Java
- Go

### Client capabilities
With these clients you can perform *all* RESTful and GraphQL requests. This means you can use any endpoint, and perform all GraphQL queries directly from your Python, TypeScript/JavaScript, Java or Go scripts.

### Installation
Install your preferred client by following the relevant instructions below:
- Python<br>`$ pip install weaviate-client`
- TypeScript/JavaScript<br>`$ npm install weaviate-ts-client`
- Go<br>`$ go get github.com/weaviate/weaviate-go-client/v4`
- Java<br>`<dependency>`<br>
    `<groupId>io.weaviate</groupId>`<br>
    `<artifactId>client</artifactId>`<br>
    `<version>4.0.0</version>´  <!-- Check latest version -->`<br>
`</dependency>`

## Review
### Review exercise
Which of the following is not true?

$\times$ We recommend using WCS for Weaviate Academy.<br>
$\times$ The newest versions of Weaviate will only be available on WCS.<br>
$\checkmark$ This unit will cover Docker/Kubernetes deployment.

Which of the following is necessary for a Weaviate instance to use with Weaviate Academy?

$\times$ A paid instance for AWS.<br>
$\times$ OpenID Connect (OIDC) authentication.<br>
$\times$ A self-hosted docker or kubernetes instance.<br>
$\checkmark$ None of the above are necessary.

Which of the following is not true about Weaviate clients?

$\times$ Weaviate clients are available in Python, TypeScript/JavaScript, Go and Java.<br>
$\times$ There is only a small subset of GraphQL queries that Weaviate clients cannot perform.<br>
$\checkmark$ Weaviate clients come bundled with Weaviate.

### Key takeaways
- There are multiple ways to run Weaviate.
- The recommended, easiest way to run a Weaviate instance is with WCS.
- Weaviate clients are available in multiple languages.
- Currently, the Academy material is available in Python only.

# Getting hands-on
https://weaviate.io/developers/academy/zero_to_mvp/hello_weaviate/hands_on

## Preparation
> OBTAIN A FREE TRIAL OPENAI API KEY<br><br>This section includes queries using the OpenAI inference endpoint. So, we recommend obtaining an OpenAI account, and obtaining a free API key which should be sufficient for all of the below.

## Overview

In [7]:
HTML('<iframe width="640" height="360" src="https://www.youtube.com/embed/8aHIAM3665c" allowfullscreen></iframe>')

Now that you've set up your own Weaviate instance and installed a client, let's get hands-on with Weaviate.

### Client instantiation
A `client` object can be instantiated for convenient access to your instance of Weaviate. You can set the following parameters:

#### Host URL (required):
This is the location of your Weaviate instance, such as `https://example.weaviate.network`.

#### Authentication information (optional)
If authentication is enabled, you MUST provide your authentication information here. Otherwise the Weaviate instance will not provide access.

#### Additional headers (optional)
You can provide additional headers here. This is used to provide API keys for inference services such as Cohere, Hugging Face or OpenAI.

> Parameter names may vary across libraries.

Putting it together, a client can be instantiated as below:
```python
import weaviate
# Only if authentication enabled; assuming API key authentication
auth_config = weaviate.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY")  # Replace w/ your Weaviate instance API key
# Instantiate the client
client = weaviate.Client(
    url="https://example.weaviate.network",
    auth_client_secret=auth_config,  # Only necessary if authentication enabled
    additional_headers={
        "X-Cohere-Api-Key": "YOUR-COHERE-API-KEY",            # Replace with your Cohere key
        "X-HuggingFace-Api-Key": "YOUR-HUGGINGFACE-API-KEY",  # Replace with your Hugging Face key
        "X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY",            # Replace with your OpenAI key
    }
)
```
> In your own code, you would only need to specify the API keys for the service(s) that you are using.

### Try it out
Now, try running the code below, making sure to replace the Weaviate URL and the API key with those from your WCS instance.

> **Custom comment**<br>If necessary, instantiate a weaviate client by following [these steps](https://weaviate.io/developers/weaviate/quickstart#create-an-instance).

In [8]:
import os
from dotenv import load_dotenv
load_dotenv()
WEAVIATE_CLIENT_URL = os.getenv("WEAVIATE_CLIENT_URL")
WEAVIATE_CLIENT_KEY = os.getenv("WEAVIATE_CLIENT_KEY")
HUGGINGFACE_API_KEY = os.getenv("HUGGINGFACE_API_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
print("environment variables loaded")
#
import json
import weaviate
auth_config = weaviate.AuthApiKey(api_key=WEAVIATE_CLIENT_KEY)
# Instantiate the client
client = weaviate.Client(
    url=WEAVIATE_CLIENT_URL,
    auth_client_secret=auth_config,
    additional_headers={
        #"X-Cohere-Api-Key": "YOUR-COHERE-API-KEY",   # Replace with your Cohere key
        "X-HuggingFace-Api-Key": HUGGINGFACE_API_KEY, # Replace with your Hugging Face key
        "X-OpenAI-Api-Key": OPENAI_API_KEY            # Replace with your OpenAI key
    }
)
meta_info = client.get_meta()
print(json.dumps(meta_info, indent=2))

environment variables loaded
{
  "hostname": "http://[::]:8080",
  "modules": {
    "generative-cohere": {
      "documentationHref": "https://docs.cohere.com/reference/generate",
      "name": "Generative Search - Cohere"
    },
    "generative-openai": {
      "documentationHref": "https://platform.openai.com/docs/api-reference/completions",
      "name": "Generative Search - OpenAI"
    },
    "generative-palm": {
      "documentationHref": "https://cloud.google.com/vertex-ai/docs/generative-ai/chat/test-chat-prompts",
      "name": "Generative Search - Google PaLM"
    },
    "qna-openai": {
      "documentationHref": "https://platform.openai.com/docs/api-reference/completions",
      "name": "OpenAI Question & Answering Module"
    },
    "ref2vec-centroid": {},
    "reranker-cohere": {
      "documentationHref": "https://txt.cohere.com/rerank/",
      "name": "Reranker - Cohere"
    },
    "text2vec-cohere": {
      "documentationHref": "https://docs.cohere.ai/embedding-wiki/",
 

Congratulations! You've made your first request to a Weaviate API; more specifically to the meta REST endpoint. The Weaviate API allows you to do quite a bit - take a look.

## Weaviate API and the client
### Available APIs
Weaviate uses two API types - REST and GraphQL. They work together to provide a rich set of functionality.

***REST VS GRAPHQL APIS***

The **REST API** is used to:
- Interact with Weaviate for CRUD (Create, Read, Update and Delete) operations, and
- To obtain metadata about the database.

The **GraphQL API** is used to:
- Interact with Weaviate for data searches.
- For example, it can be used to retrieve data objects, aggregate information, and explore vector spaces.

You will learn about them over the course of these units.

> ***What is REST?***<br>REST is an acronym for **RE**presentational **S**tate **T**ransfer.<br>A REST API provides multiple endpoints, each with its own URL, that can be used to interact with the API.<br>The endpoints are organized into a hierarchy, with each endpoint representing a resource. The client can then request information about these resources by sending a request to the server.

> ***What is GraphQL?***<br>GraphQL is a query language for APIs.<br>First released by Facebook in 2015, it is now maintained by the GraphQL Foundation.<br>GraphQL is a specification for a query language that can be used to request information from a server. GraphQL is a **strongly typed** language, which means that the client must specify the type of data that it wants to receive.

### Accessing the REST API
Both REST and GraphQL APIs can be accessed using client libraries.

Take the commands shown below. In both examples, the code in each tab are functionally the same:

#### Example 1: REST vs client requests
- REST<br>`curl http://localhost:8080/v1/meta`
- Python<br>`import weaviate`<br>`client = weaviate.Client("http://localhost:8080")`<br>`print(client.get_meta())`

#### Example 2: GraphQL vs client requests
- GraphQL<br>`{`<br>`  Get {`<br>`    WikiArticle {`<br>`      title`<br>`      wiki_summary`<br>`    }`<br>`  }`<br>`}`
- Python<br>`result = client.query.get("WikiArticle", ["title", "wiki_summary"]).do()`

Now, let's try out more substantive queries.

## Running queries
### Connect to our demo instance
We will connect to a demo Weaviate instance containing data, and run queries ourselves. The instance has these details:

> DEMO INSTANCE DETAILS<br>`url`: `https://edu-demo.weaviate.network`<br>`Weaviate API key`: `readonly-demo`

With these details see if you can:
- Instantiate a Weaviate client, and
- Check that it works by fetching the metadata as we did above.

Bonus points if you can do it without looking at the snippet below:

In [9]:
auth_config = weaviate.AuthApiKey(api_key=WEAVIATE_CLIENT_KEY)
# Instantiate the client
client = weaviate.Client(
    url=WEAVIATE_CLIENT_URL,
    auth_client_secret=auth_config,
    additional_headers={"X-OpenAI-Api-Key": OPENAI_API_KEY}
)
meta_info = client.get_meta()
print(json.dumps(meta_info, indent=2))

{
  "hostname": "http://[::]:8080",
  "modules": {
    "generative-cohere": {
      "documentationHref": "https://docs.cohere.com/reference/generate",
      "name": "Generative Search - Cohere"
    },
    "generative-openai": {
      "documentationHref": "https://platform.openai.com/docs/api-reference/completions",
      "name": "Generative Search - OpenAI"
    },
    "generative-palm": {
      "documentationHref": "https://cloud.google.com/vertex-ai/docs/generative-ai/chat/test-chat-prompts",
      "name": "Generative Search - Google PaLM"
    },
    "qna-openai": {
      "documentationHref": "https://platform.openai.com/docs/api-reference/completions",
      "name": "OpenAI Question & Answering Module"
    },
    "ref2vec-centroid": {},
    "reranker-cohere": {
      "documentationHref": "https://txt.cohere.com/rerank/",
      "name": "Reranker - Cohere"
    },
    "text2vec-cohere": {
      "documentationHref": "https://docs.cohere.ai/embedding-wiki/",
      "name": "Cohere Module"


### Vector searches
Try running the following query. This will search the WikiCity objects for those closest to the specified text, which is in this case "Major European city".

In [10]:
res = (
    client
    .query
    .get("WikiCity", ["city_name", "country"])
    .with_near_text({"concepts": ["Major European city"]})
    .with_limit(5)
    .do()
)
#print(json.dumps(res, indent=2))
print(res)

UnexpectedStatusCodeException: Query was not successful! Unexpected status code: 422, with response body: {'error': [{'message': 'no graphql provider present, this is most likely because no schema is present. Import a schema first!'}]}.

In [11]:
res = (
    client
    .query
    .get("WikiCity", ["city_name", "country", "lng", "lat"])
    .with_near_text({"concepts": ["Major European city"]})
    .with_limit(5)
    .do()
)
print(json.dumps(res, indent=2))

UnexpectedStatusCodeException: Query was not successful! Unexpected status code: 422, with response body: {'error': [{'message': 'no graphql provider present, this is most likely because no schema is present. Import a schema first!'}]}.

> EXERCISE<br>Try varying the query concept from "Major European city" to another - what do you see? Is it in line with what you expected?

### Question answering
The below example will search the `WikiCity` objects to look for an answer to the question "When was the London Olympics?". Try it out yourself.

In [12]:
ask = {
  "question": "When was the London Olympics?",
  "properties": ["wiki_summary"]
}
res = (
  client.query
  .get("WikiCity", [
      "city_name",
      "_additional {answer {hasAnswer property result} }"
  ])
  .with_ask(ask)
  .with_limit(1)
  .do()
)
print(json.dumps(res, indent=2))

{
  "errors": [
    {
      "locations": [
        {
          "column": 6,
          "line": 1
        }
      ],
      "message": "Cannot query field \"WikiCity\" on type \"GetObjectsObj\".",
      "path": null
    }
  ]
}


> EXERCISE<br>Try varying the question from "When was the London Olympics?" to another, city-related, question. What do you see?<br>Try to see what types of questions work better than others. Do you notice any patterns?

### Generative search
The below example will search the `WikiCity` objects, and then transform the results through the `generative-openai` module. Try it out:

In [13]:
res = (
    client
    .query
    .get("WikiCity", ["city_name", "wiki_summary"])
    .with_near_text({"concepts": ["Popular South-East Asian tourist destination"]})
    .with_limit(3)
    .with_generate(single_prompt="Write a tweet with a potentially surprising fact from {wiki_summary}")
    .do()
)
print(res)
#for city_result in res["data"]["Get"]["WikiCity"]:
#    print(json.dumps(city_result["_additional"], indent=2))

{'errors': [{'locations': [{'column': 6, 'line': 1}], 'message': 'Cannot query field "WikiCity" on type "GetObjectsObj".', 'path': None}]}


> EXERCISE<br>Try varying the prompt from "Write a tweet with a potentially surprising fact from `{wiki_summary}`" to another. (What happens if you remove `{wiki_summary}`?)

## Review
### Review exercise
Which of the following is not true about the Weaviate API?

$\times$ Weaviate users can use both REST and GraphQL.<br>
$\times$ The REST API can be used to retrieve the current schema.<br>
$\checkmark$ Both GraphQL and REST API can be used in weaviate to perform vector searches.<br>
$\times$ None of the above.

Weaviate clients cannot do which of the following?

$\checkmark$ Perform analysis on the retrieved results.<br>
$\times$ Communicate with the Weaviate REST API.<br>
$\times$ Communicate with the Weaviate GraphQL API.

### Key takeaways
- Weaviate uses two API types, REST and GraphQL. REST is used for CRUD operations and metadata, while GraphQL is used for data searches, retrieving data objects, and exploring vector spaces.
- Client libraries are used to access both REST and GraphQL APIs, providing a convenient way to interact with Weaviate instances.
- You have connected to a demo Weaviate instance to run vector searches, question-answering queries, and generative searches.

# Wrap-up
## Unit review
In this unit, we aimed to provide you with an overview of Weaviate.

We did this by covering what Weaviate is, and what it can do, before discussing what vector database and vector search are before going on to run Weaviate and perform vector searches yourself.

Now, you should have a foundation of knowledge from which we can begin to learn more about Weaviate, including more details on how to build a database and perform queries. Before long, you will be creating your own projects using Weaviate.

### Learning outcomes
Having finished this unit, you should be able to:
- Broadly describe what Weaviate is.
- Outline what vector search is.
- Create a Weaviate instance on WCS.
- Install your preferred Weaviate client (Python for Weaviate Academy).
- Describe some of Weaviate's capabilities.