# A Fullstack AI Image Generation Web App with Stable Diffusion, Fast API and React

As a fun project for a hot summy weekend, I built a minimal fullstack text-to-image AI web app with [Stable Diffusion](https://stability.ai/stablediffusion) deployed through [Amazon SageMaker JumpStart](https://aws.amazon.com/sagemaker/jumpstart), [FastAPI](https://fastapi.tiangolo.com/) for the web backend, and [React](https://react.dev/) for the web frontend. 

Here I'm walking through how I built it step by step.

## Deploy the Model

To deploy the Stable Diffusion model, I took a shortcut to use SageMaker JumpStart: `Stable Diffusion 2.1 base` pretrained on [LAION-5B](https://laion.ai/blog/laion-5b/) is provided under SageMaker JumpStart's "Foundation Models: Image Generation" ML task. It can be deployed by one click, so that's what I did!

Find the model:

![SageMaker JumpStart](images/SageMaker-JumpStart.png)

One-click deploy it (here I used all the default settings for simplicity):
![Deploy](images/Model-deploy.png)

When the endpoint is ready in service, a sample notebook is provided.
![open notebook](images/Open-notebook.png)

By opening it in SageMaker Studio you can already play around with it. Since I have all my AWS credentials and Jupyter notebook server already set up locally, I downloaded it and had some fun in my local VSCode 😆. (You can find this notebook in my github repo.)

```python
# response = query_endpoint("cottage in impressionist style")
response = query_endpoint("a cat astronaut fighting aliens in space, realistic, high res")

img, prmpt = parse_response(response)

# Display hallucinated image
display_image(img,prmpt)
```

![](images/output-1.png)

## Build the API

### A Fast and Minimum Start

I've wanted to try out FastAPI because it looks so fast and has a nice Swagger experience built in. So I first installed it as instructed in the [official website](https://fastapi.tiangolo.com/#installation):

```bash
# Install fastapi as well as the ASGI server Uvicorn
$ pip3 install fastapi
$ pip3 install uvicorn[standard]
```


Then I created a `main.py` file under my `api/` folder and typed in the minimal example of a FastAPI app:

```python
from typing import Union

from fastapi import FastAPI

app = FastAPI()


@app.get("/")
def read_root():
    return {"Hello": "World"}


@app.get("/items/{item_id}")
def read_item(item_id: int, q: Union[str, None] = None):
    return {"item_id": item_id, "q": q}
```

Run the `uvicorn` dev server in my `/api` folder:

```bash
$ uvicorn main:app --reload
```

And as promised, I got a working API at `http://127.0.0.1L8000/` and my API doc Swagger UI at `http://127.0.0.1:8000/docs`, instantly 🚀!

![](images/fast-api.png)

![](images/swagger.png)

### Connect the API to the Model

Now it's time to make our API able to send a prompt to our Stable Diffusion model and get a response image. Let me first add a route and handler function for that:

```python
# api/main.py
@app.get("/generate-image")
def generate_image(prompt: str):
    image, prmpt = utils.parse_response(utils.query_endpoint(prompt))
    print(image)
    return {"out": "yeah"}
```

I picked the `query_endpoint()` function and the `parse_response()` function from the Stable Diffusion example Notebook that SageMaker JumpStart generated for me, and packed them in a `utils.py` file to be used in the API `main.py` file.

After saving the `main.py` file with the added new route, the Swagger UI conveniently added it and provides a test UI for me to input my prompt!

![](images/swagger-2.png)

And the model does send back the image generation! What does it look like? Well, it's an array of RGB channel values of each pixel in the image! Crazy, isn't it?

![](images/image-pixels.png)

### Process and Save the Image

To turn the magic numbers into image, I used [numpy](https://numpy.org/) and [PIL(Pillow)](https://pypi.org/project/Pillow/). For a quick test, I just added another utility function to save the image to disk (where the function is running).

```python
# api/utils.py
from PIL import Image
import numpy as np

# ...

def save_image(pixels):
    arr = np.array(pixels, dtype=np.uint8)
    img = Image.fromarray(arr)
    img.save("new.png")
```

Then in the API `/generate-image` route's handler, I plugged the image pixel array from response to this new utility function, and sent the prompt as the API response for now.

```python
# api/main.py
# ...
@app.get("/generate-image")
def generate_image(prompt: str):
    image, prmpt = utils.parse_response(utils.query_endpoint(prompt))
    utils.save_image(image)
    return {"prompt": prmpt}
```

Tested out in the Swagger UI with a new prompt: "A unicorn astronaut in space, full body from side".

![](images/swagger-3.png)

And in the `/api` folder a `new.png` appeared - the application works! Have a look at the unicorn astronaut image generated by Stable Diffusion and saved by the application:

![](images/new.png)

### Support CORS

For the UI to make AJAX call to the API later, I need to enable CORS support for the API. To do that, I added FastAPI's [CORSMiddleware](https://fastapi.tiangolo.com/tutorial/cors/).

```python
# api/main.py
from fastapi.middleware.cors import CORSMiddleware

# Support CORS
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)
```

## Build the UI

### Scaffolding

Now that the image generation backend is working, I'm marching on to build the frontend. For speed and simplicity, I used [Vite](https://vitejs.dev/), choosing the `React + TypeScript + SWC` variant.

![](images/vite-ui.png)

For UI components, I wanted to try out [Chakra](https://chakra-ui.com/) for a long time. It has a nice [installation guide](https://chakra-ui.com/getting-started/vite-guide) when using Vite.

```bash
$ npm i @chakra-ui/react @emotion/react @emotion/styled framer-motion
```

After installation, just sneak in the `<ChakraProvider>` tags into the `main.tsx` code:

```tsx
// ui/src/main.tsx
import { ChakraProvider } from "@chakra-ui/react";

ReactDOM.createRoot(document.getElementById("root")!).render(
  <React.StrictMode>
    <ChakraProvider>
      <App />
    </ChakraProvider>
  </React.StrictMode>
);
```

### Build the Minimal UI

The minimal UI would just need three components: 
1. a text input for the user to enter the prompt
2. a submit button to send the prompt to the API
3. an image component to show the generated image

With Chakra-UI's ready-made components `Input`, `Button`, and `Image`, it's much easier to make these look properly nice. For layout I used a `Container`, an `InputGroup`, and tweaked some margins here and there. For completeness I also added a `Heading`.

```tsx
// ui/src/App.tsx
// ...

return (
    <Container maxWidth={"2xl"} marginTop={30} centerContent>
      <Heading margin={8}>Image Generator</Heading>
      <InputGroup>
        <Input
          pr="4.5rem"
          value={prompt}
          placeholder="Enter your prompt"
          onChange={onInputChange}
        />
        <InputRightElement width={"6rem"}>
          <Button onClick={onButtonClick} isDisabled={isLoading}>
            Generate!
          </Button>
        </InputRightElement>
      </InputGroup>
      {imgSrc ? (
        <Box boxSize={"l"} marginTop={5}>
          <Image src="https://picsum.photos/640" />
        </Box>
      ) : null}
    </Container> 
);
```

To tweak the image size and position I used a dynamic dummy image  from https://picsum.photos/.

![](images/minimal-ui.png)

### UI States, Event Handlers, API Call

There are three pieces of dynamic states in the minimal UI app: 
1. the `prompt` text, which is bound to the value of our text input.
2. the `imageSrc`, which should come from the backend and also dictate whether to render the image component at all. 
3. a `isLoading` state, for a better user experience I'd like to show a loader and disable the "Generate" button when the user is waiting for their image to be generated.

```typescript
// ui/src/App.tsx
function App() {
  // states
  const [prompt, setPrompt] = useState("");
  const [isLoading, setIsLoading] = useState(false);
  const [imgSrc, setImgSrc] = useState("https://picsum.photos/640"); // we'll replace the initial value to null later so that the image component is not rendered when there is no imgSrc
//...
}
```

As for event handlers, first bind the text input's change event to update the prompt state with every change of the input value:

```tsx
// ui/src/App.tsx
  const onInputChange = (e: ChangeEvent<HTMLInputElement>) => {
    setPrompt(e.target.value);
  };
```

Then a "Generate" button click should trigger an AJAX fetch call to the `/generate-image?prompt=` API, with the `prompt` state value given as the query parameter. I wrapped the API call with some careful error handling, as well as updating the loading states for UX:

```tsx
// ui/src/App.tsx
const onButtonClick = async () => {
    setIsLoading(true);
    try {
      const response = await (
        await fetch(`http://127.0.0.1:8000/generate-image?prompt=${prompt}`)
      ).json();
      console.log(response);
    } catch (err: any) {
      // catch any runtime error
      console.log(err.message);
    } finally {
      setIsLoading(false);
    }
};
```

To enhance the user experience, let me add a spinner below the input:

```tsx
// ui/src/App.tsx
// ...
    </InputGroup>    
    {isLoading ? (
        <Spinner
          thickness="4px"
          speed="0.65s"
          emptyColor="gray.200"
          color="blue.500"
          size="xl"
          marginTop={6}
        />
      ) : imgSrc ? (
        <Box boxSize={"l"} marginTop={5}>
          {/* <Image src={`data:image/png;base64,${imageData}`} /> */}
          <Image src={imgSrc} />
        </Box>
      ) : null}
  </Container>
//...
```

At this stage, entering a prompt and clicking the "Generate!" button actually triggers the image generation API and makes an inference call to the Stable Diffusion model. To see the generated image, just check the `new.png` image in the `/api` folder!  

For example, to the prompt "A fluffy rabbit hacker coding at a macbook pro, cyberpunk", this image was saved in the `/api` folder:

![](images/new-rabbit-hacker.png)

Next up, instead of saving the image locally, our API is going to send it back the front end!

## Backend and Frontend Integration

This is the exciting part, our user will finally see the generated image after clicking the "Generate!" button!

### Send Image In the API Response

So far, our API has only been sending the prompt back, while saving the image for itself. 

To get the image out of it, we'll send the image data as a base64 encoded string. So our API response JSON schema becomes:

```json
{
    "prompt": prmpt, // the prompt string from user 
    "img_base64": img_str // the generated image as a base64 data string
}
```

I'd still want to keep a copy of the generated image in the server, so I rearranged the image related utility functions to be

1. a general util function to convert the pixel arrays from Stable Diffusion into a PIL Image
   
   ```python
   # utils.py
   def pixel_to_image(pixel_array):
       arr = np.array(pixel_array, dtype=np.uint8)
       img = Image.fromarray(arr)
       return img
   ```
2. a function to save a PIL image to the server
   
   ```python
   # utils.py
   def save_image(img, filePath="generated_images/new.png"):
       img.save(filePath)
   ```
   This one accepts a file path, and by default it will save the image as `new.png` in the `generated_images` folder. 

3. a function to convert a PIL image to a base64 encoded image data string
   
   ```python
   # utils.py
   def image_to_base64_str(img, format="PNG"):
       buffered = BytesIO()
       img.save(buffered, format=format)
       img_str = base64.b64encode(buffered.getvalue())
       return img_str
    ```

Then in our API app, using these utils functions to send back both the prompt and the image data string is easy:

```python
# main.py
@app.get("/generate-image")
def generate_image(prompt: str):
    pixel_array, prmpt = utils.parse_response(utils.query_endpoint(prompt))
    image = utils.pixel_to_image(pixel_array)

    utils.save_image(
        image, filePath=f"generated_images/{str(datetime.datetime.now())}.png"
    )
    img_str = utils.image_to_base64_str(image)

    return {"prompt": prmpt, "img_base64": img_str}
```

So as not to overwrite the `new.png` over and over again, here I name every new image with the timestamp when it was saved.

### Show the Image in Frontend

Now that our backend sends the image data back, let's update the API call handling to:
1. extract the image data string from the response JSON
2. use the data to update our `imgSrc` state

```tsx
// App.tsx
const onButtonClick = async () => {
    setIsLoading(true);
    try {
      const response = await (
        await fetch(`http://127.0.0.1:8000/generate-image?prompt=${prompt}`)
      ).json();
      console.log(response);
      const lastPrompt = response["prompt"];
      const imgBase64 = response["img_base64"];
      setImgSrc(`data:image/png;base64, ${imgBase64}`);
    } catch (err: any) {
      // catch any runtime error
      console.log(err.message);
    } finally {
      setIsLoading(false);
    }
  };
```
Feeding a base64 image data string as the `src` of an HTML `<img>` tag will render the image just as giving it an image URI. Plus, it saves you one network round trip that you would have needed to fetch the image.

Now let's prompt the model to generate a llama family image. And voila!

![](images/llama.png)

## Acknowledgements
This project was inspired by this amazing [YouTube video](https://www.youtube.com/watch?v=3l16wCsDglU). Salute to the brilliant creator [Nicholas Renotte](https://www.nicholasrenotte.com/)!