KEY DATES LOG:

5/7 - got next.js hello world working on custom domain with ec2
11/7 - set up the frontend server so that it autostarts the webpage when launched, and keeps the same IP address 
14/7 - got API from frontend VM to backend VM working
17/7 - got GET and preset image from S3 rendering on frontend
19/7 - got a runpod uploading to S3 + got aspects of runpod API working on my machine
21/7 - got MVP working: prompt returning an image on frontend

Previous experiment with training LoRAs weren't possible on the Mac, so just going to run a regular stable diffusion model.

Based on advice from ChatGPT and my own research:

- Next.js as it is a full package and is optimal for SEO
- FastAPI (Python) for backend IF NEEDED due to familiarity and it's a modern technology
- TailwindCSS for styling
- EC2 for running the AI model (Stable Diffusion)
- Redis for queuing
- DynamoDB for general storage NEED A SQL DATABASE TOO
- Firebase Auth + Stripe for authorisation and billing
- CloudFront for CDN (or is this already integrated into the next.js hoster)
- ComfyUI for generating images
- GitHub Actions for CI/CD
- RDS for credits data storage

Next.js setup:

npx create-next-app@latest my-ai-app
cd my-ai-app

say yes to all options

npm run dev, to run

hosting on ec2:

launch a micro ec2 instance
ssh into it (run in bash: chmod 400 /Users/klioballiu/Desktop/RollerAI/rollerai_key.pem then  ssh -i "rollerai_key.pem" ubuntu@ec2-18-175-197-226.eu-west-2.compute.amazonaws.com),
 install node and git, run the build
 curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt-get install -y nodejs
sudo apt install git
git clone https://github.com/your-username/your-repo.git
cd to relevant folder...
npm install
npm run build
npm start

setup reverse proxy using nginx and domain (open relevant ports and point domain to ec2 with an A record), and secure with https using 'lets encrypt'

pm2 is used to keep the app running and doesn't 'block' the ssh line

I had the right feeling that adding 3000 to inbound rules is not sound security-wise. Instead we use reverse proxy to reflect it onto the protected ports

need to do elastic IP on the front end EC2, but I'll do it when it's needed.

[User Web App]
     |
     v
[Frontend (EC2 1) — Next.js + Auth + Stripe]
     |
     v
[Redis Queue (hosted or self-managed)]
     |
     v
[Worker Server (EC2 2) — ComfyUI/Auto1111 + Stable Diffusion]

User Request
   ↓
Web Server (e.g., Next.js + API)
   ↓
Redis Queue  ← Monitor (sees queue length or task time) + KURBENETES?
   ↓
Worker (on EC2) ← Autoscaler (spins more EC2s if needed) + KURBENETES?

+--------------------+
|  User Request/API  |
+--------------------+
          ↓
+--------------------+         +--------------------+
|   API Pod (FastAPI)|  → → →  |     Redis Pod      |
+--------------------+         +--------------------+
          ↓                            ↑
   Enqueue image job          Store job in queue
          ↓                            ↓
+---------------------+   ← ← ←  +---------------------+
|  Worker Pod (Celery,|         |  KEDA (Auto-Scaler)  |
|  RQ, custom script) |         +---------------------+
+---------------------+

1 - FastAPI endpoint 
2 - Worker script + image generator
3 - Redis queue 
4 - Deploy on Kubernetes
5 - Add KEDA + stress test & autoscale

I ASSUME THIS WILL ALL RUN IN A 'virtual private cloud'

Deciding to not do image to image as it's too complex to get working right. ControlNet is definitely needed as I tried an online img2img and even when fine-tuning the image weight the results were still rubbish.

Also choosing SD1.5 as it is the lightest model

Turns out spot instances can actually last a few hours so they can be good for images - check their 'frequency of interruption'

Using PostgreSQL for data integrity for the credits system (use amazon RDS)

Choosing 'diffusers' to run SD1.5 on the EC2 as GPT says it's the best for CLI and API pipelines.

Going to try g4n.xlarge first, then g5.xlarge if its not enough

If I understand this right, the reason we need event-driven scaling is because an EC2 instance is needed per VM due to the block-y nature of processing a text-to-image request. OR is Kubernetes used on a global and local scale - I guess it makes sense if I'm paying for the EC2 anyway I might aswell use the full processing power, so Kubernetes will scale this LOCALLY first, and then if not enough to process the queue, it will then recruit more EC2s. But, KEDA can instead know how long the queue will be and get enough EC2s ready, which is more efficient. GPT agreed.

BUT for simplicity I might not bother running Kubernetes on the actual EC2s themselves and just accept it is inefficient for now.

KEY - will definitely have to shut down all these EC2s, too many are needed in total.
I'll just have to record videos of it working, and make notes on how it all links together.

Just going to set up elastic IP for the frontend EC2 since that's the thing many other entities link to.

Not going to use Stripe as I'm already learning interaction through an external API by using Firebase Auth, and it'll be long to actually test with money. Just going to modify database entries when needed.


I'm also setting up the launch commands so when the EC2 instance starts it runs code to get the next.js server running

setting up the launch script:

cd RollerAI/my-app
npm run build
npm start

Ended up using pm2 as launch scripts require you to create the machine after defining the launch template:

pm2 start npm --name "roller-ai" -- start

pm2 save
pm2 startup

sudo env PATH=$PATH:/usr/bin /usr/lib/node_modules/pm2/bin/pm2 startup systemd -u ubuntu --hp /home/ubuntu

That works now, next is to set up another EC2 as the worker bot and have them communicate via an API.

setting up the worker bot:

- chose 'deep learning ami neuron'
- g4dn.xlarge to start with 
- had to request an increase in 'Running On-Demand G and VT instances' to 4 (It's not getting approved currently)

OKAY I tried getting approval for spot and normal g-type EC2s and it's not getting approved.
Based on GPT reccomendations I'm going to try runpod.io and GCP. I want to make sure there's still an aspect where the Redis/KEDA queue comes into use. 

Okay it's looking like I'll use runpod over GCP for simplicity, it's for short tasks so it's better, and no need for all the IAM/enterprise stuff on the google side.

Just make sure you follow security practices if you use the 'community cloud' option which is where I think the spot pricing stuff is.

Right now the planned system diagram is looking something like this:

+----------------+             +----------------+
| Firebase Auth  | <---------> |   Credits DB   |
+----------------+             +----------------+
          ^                             ^
          |                             |
          v                             v
     +---------------------------------------+
     |              Frontend VM              |
     +---------------------------------------+
          ^                        |   ▲
          |                        v   |
          |              +---------------------+
          |              |    Redis + KEDA     |
          |              +---------------------+
          |                        |
          |                        v
          |              +---------------------+
          |              |       RunPod        |
          |              |  +---------------+  |
          |              |  |     Pod 1     |  |
          |              |  +---------------+  |
          |              |  |     Pod 2     |  |
          |              |  +---------------+  |
          |              |  |     Pod 3     |  |
          |              |  +---------------+  |
          |              +---------------------+
          |                        |
          |                        v
          |              +---------------------+
          +------------- |         S3          |
                         +---------------------+

Runpod:

- need to decide which of the services I will use, I know I will use the cheapest option which is the RTX A5000.
- I will use 'Secure Cloud' not 'Community Cloud' as community just seems like an unreliable and inconsistent mess. The three options remaining are: pod spot; pod normal hourly; serverless

So I've decided to use pod spot. The jobs are very short and I want to implement the try/except logic anyway. Also I'm not choosing serverless due to price and the fact that there'd be no point for the Redis/KEDA stuff. Going to pause this stuff and get Firebase/credits working first.

Firebase + credits:

Key for SQL - have a normal id as the primary as it makes SQL operations quicker, and use 'UNIQUE NOT NULL' for unique identifiers so your database maintains integrity.

I initally thought once I get the token from Firebase it just gives me an ID I use to search in the database but you need a verifier before you allow it to access a DB, and in general it's not good practice to expose a database to the frontend. The SDK will live on the same server as the frontend server. I'll have to use pm2 to have the two systems working simultaneously.

Using Amazon RDS to host a PostgreSQL database.

ACTUALLY I realised I need to do authentication at the end because otherwise I can't test Redis.

New system diagram:


+----------------+             +----------------+
| Firebase Auth  |             |   Credits DB   |
+----------------+             +----------------+
          ^                             ^
          |                             |
          v                             v
     +---------------------------------------+
     |              Frontend VM              |
     |         --------------------          |
     |           Firebase Auth SDK           |
     +---------------------------------------+
          ^                        |   ▲
          |                        v   |
          |              +---------------------+
          |              |    Redis + KEDA     |
          |              +---------------------+
          |                        |
          |                        v
          |              +---------------------+
          |              |       RunPod        |
          |              |  +---------------+  |
          |              |  |     Pod 1     |  |
          |              |  +---------------+  |
          |              |  |     Pod 2     |  |
          |              |  +---------------+  |
          |              |  |     Pod 3     |  |
          |              |  +---------------+  |
          |              +---------------------+
          |                        |
          |                        v
          |              +---------------------+
          +------------- |         S3          |
                         +---------------------+

Ok I reiterated to GPT I just want two pods, and a Python script was suggested instead. I'm going to try figure it out myself, because I want some type of hysterisis where it spins up the second one when queue is bigger than 4 but then only stops when queue is less than 2. It's better too since it will get me to MVP quicker. Also removed the arrow between Firebase Auth.

Also I thought more about how I'll actually get the image displayed. My thought was that I generate a unique ID with the prompt, then I poll the s3 waiting for the image to be generated that has the id. But S3 URLs are random, it's costly to constantly poll s3, and there's no timeout/error strategy. 

After conversation with GPT the new strategy is runpod uploads to an s3 url, and sends a 'complete' signal back to the redis ec2. The front end is constantly polling the redis hash to check if the relevant id key value is complete, and when it is, it grabs the key which is the url, sends that url to the front end which loads the right image . 

So this means the arrow between runpod and the redis VM goes both ways now. Going to upgrade to a t3.medium just in case.

New system diagram is below:

+----------------+          +----------------+
| Firebase Auth  |          |   Credits DB   |
+----------------+          +----------------+
          ^                          ^
          |                          |
          v                          v
+---------------------------------------+
|              Frontend VM              |
|         --------------------          |
|           Firebase Auth SDK      (EC2)|
+---------------------------------------+
     ^                        |   ^
     |                        v   |
     |    +------------------------------+
     |    | Redis + Python script   (EC2)|
     |    +------------------------------+
     |                        |  ^
     |                        v  |
     |              +---------------------+
     |              |       RunPod        |
     v              |  +---------------+  |
+------------+      |  |     Pod 1     |  |
|     S3     |<-----|  +---------------+  |
+------------+      |  |     Pod 2     |  |
                    |  +---------------+  |
                    +---------------------+



This is the work sequence I'm going to follow, I'm pretty sure that my system diagram is set in stone now:

- set up t3.medium for redis system with ElasticIP 
- work on getting string input with text box that sends to redis backend side
- then get the API working so runpod sends to s3
- create redis queue so multiple requests can happen at the same time
- create redis hash and setup APIs and frontend code so image appears on frontend
- add on your python script so tasks are balanced between two workers spun up as necessary with hysterisis
- set up firebase auth so user has to log in before being able to request an image
- add a credits system linked to RDS

So the python pseudo-equivalent of pm2 is uvicorn:

sudo nano /etc/systemd/system/fastapi.service
inside it write this

[Unit]
Description=FastAPI app with Uvicorn
After=network.target

[Service]
User=ubuntu
WorkingDirectory=/home/ubuntu/RollerAI/redis_vm
ExecStart=/usr/bin/env uvicorn scaler:app --host 0.0.0.0 --port 8000
Restart=always

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reexec
sudo systemctl daemon-reload
sudo systemctl enable fastapi.service
sudo systemctl start fastapi.service

adding CORS to try get communication working - this wasn't the issue. GPT suggested using console on chrome, I see it's because although the frontend is in https, backened is http, so I get 'mixed content' error.

I solved this by using API gateway. The thing is to get a HTTPS certificate you need a custom domain so the certificate giver knows you own the IP, which would mean more nginx setup and setting records. In API gateway I had to set CORS terms, and link it to backend. Then in the frontend code I send through the amazon url instead of my backend IP.

Redis hash is better than python hash as it can survive restarts (it is persistent) as it has a background process. Also the hash is more easily shareable and accessible across machines. Finally, Redis hashes allow multiple subkeys from a main key which works better for my layout, like below:

In [None]:
jobs = {
    "job:123": {
        "status": "done",
        "s3url": "...",
        "prompt": "..."
    }
}

So now we define a GET so the frontend server can poll the backend periodically - WebSockets is another solution for this. I have to add this route to API Gateway.

I think a key thing with web development/apps is to have detailed error codes, timeouts, retry limits, try/except logic, etc. 

To parse the incoming POST request, we use Request object from fastapi to dissect and inspect the information.

Will be readjusting the work sequence so it makes a bit more sense, and I can kind of 'finish' the frontend part up until MVP stage:

- Redis hash working so it stores input string and status, and get GET polling + image rendering working so it displays a set S3 image on the frontend
- Runpod API uploading to S3 with passed through strings + generated image based on prompt rendering on frontend (MVP)
- Redis queue integrated and working
- Python hysterisis script with two pods
- Firebase Auth + SDK
- Credits system (VP)

It makes sense why people use CI/CD and automated testing now. I'm going to take a small pause to set up automatic git pulling and building+running of my frontend and backend EC2 code.

The process seems to be make a bash file that automates the commands, then set up a systemmd service to run it on startup, then config/run the daemon.

Frontend is now properly receiving the Job ID. Next step is to get GET request working, and to implement code that displays an S3 URL.

GET request:

Note for GET requests on the frontend you need to put the variable you want in the request URL.

We need to:
- Add the route in the backend that will return the S3 URL when it exists, for now we'll just return a pre-existing link to a 1024x1024 photo
- In the frontend add code which will periodically poll until the link is received (for now it will return instantly)

Side note: I learnt that React scripts are event-driven and declarative, they don't run sequentially/procedurally like Python files - each piece of code runs based of a hooks/event handlers i.e. some type of event. Also, all code must be in one of these hooks. Just be careful because even within functions some things are asynchronous, like the 'setX' from useState, so you can set something but it's not actually set yet.

For s3 to display on the frontend I had to do the CORS settings, but in s3 you also have to specify access settings even if public access is enabled due to the multi-layer security it uses.

Next goal is to get runpod generating images and displaying on frontend, which will lead to MVP:

- adjust backend code
- learn the API and code it in FastAPI (make a seperate file if needed to keep it clean)
- use the cheapest GPU and SDXL
- ensure the pods shut down after image generation
- ensure the API key is hidden from the repo

Okay I realised if I still want to return the Job ID and info about queue, I need to start the runpod task in a nonblocking way to allow the POST request to return the Job ID. The solution to this is a queue or a background task (BackgroundTasks in FastAPI). Since I'm going to do the queue anyway I'll just try implement the queue and the API all in the same step. The same 4 points from above apply. The task list now looks like this:

- Tasks added onto a Redis queue + Runpod API working and sending images to S3 + generated image based on prompt rendering on frontend (MVP)
- Python hysterisis script with two pods launching and shutting down as needed, including starting and shutting down both when queue is zero
- Firebase Auth + SDK
- Credits system

I'm just doing some sketches, reading the runpod help docs, and conversing with GPT trying to understand the layout of all this and how it fits together, currently I think I'm thinking of this the wrong way.

Ok I think I've wrapped my head around it. We deploy a 'worker' script on each of the pods which pulls jobs off a centrally hosted queue, instead of having on code on a central location trying to work outwards and collaborate with different pods. So the workers just take a job when they're ready (and execute the relevant generation function) and submit it, and Redis handles things like retries and distributing info itself. Then the hysterisis script works by simply activating and deactivating the workers. Finally, instead of writing API code to create workers, since it's just two workers I will just configure them in the Runpod website to keep things clean.

Below is a diagram I made for this specific interaction.

                        
                          +======+
        ----------------->|  S3  |<-----------------
        |                 +======+                 | 
        |                                          |
+================+                        +================+   
|     POD 1      |   +================+   |     POD 1      |
|----------------|<--|   Autoscaler   |-->|----------------|
|    worker.py   |   +================+   |    worker.py   |   
+================+           ^            +================+
          ^                  |                  ^
          |                  |                  |
          v                  v                  v
+--------------------------------------------------------+
|                     Redis Queue                        |
+--------------------------------------------------------+
                            ^
                            |
                    +---------------+
                    |    Backend    |
                    +---------------+

https://console.runpod.io/explore/r7rgu7rksq - going to configure the two runpods with this to try start with.

So configuring these two pods has made me realise a few things about these types of virtual pods/machines, especially when it comes to spot instances. Even these 'pods' which are 'GPUs' come with ephemeral storage, RAM, and vCPUs to actually make them a usuable package. You can store stuff and install stuff on the ephemeral storage, running it on any processers attached to the VM. You can also attach disk storage, which allows you to store stuff outside the 'VM package'. This comes into use especially with spot instances, as if your spot instance gets interrupted you lose anything in the package. The disk storage allows you to connect data quickly to any other VM in the network i.e. the whole cloud. 
So when it comes to spot instances, you need disk/volume storage to allow you to store models locally and any small configurations which will mean a somewhat quicker startup instead of a full download and reinstall.

For example initial install took about 4 minutes, whereas reboot with storage took 1m30s. 
This means that I'll just keep one on and have the second one scale as almost 2 minutes is still too long.

Also, I'm realising that the logic to control these instances can be more complex than initially seems. For example, I can have pod1 as the base one and pod2 as the reserve, but what if pod1 goes down? Then pod2 will wait for a larger queue when it should start working automatically. Also each pod might not be fully utilised (each pod consists of the system, and the container running workflows, but with docker the container is the docker instance and then it runs SDXL in each container). This is where kubernetes/docker comes in as once configured, it manages this stuff automatically.

Just need to make sure I configure each VM the same.

Swapping to normal instances, spot kept getting interrupted often, and I think it would require a lot of work with respect to restarts and waits, can still include fault-tolerance by adjusting the redis queue and worker code.

On further thought I will have to use a basic PyTorch template and run off HuggingFace Diffusers as I'll need an API/script-based interaction. This takes reboot time down to around 10 seconds so I can definitely have a 3-stage hysterisis now


For best practices, I created an IAM role for uploading to s3 which I'll assign to these two bots, just need to figure out how this works command line/access-wise

Going to log all steps taken in setting up this pod as will have to transfer it to the next one:

- pip install diffusers transformers accelerate safetensors
- created folder rollerai_generate, inside it created 'runpod_worker.py' and pasted in from my repo

then:
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
apt update && apt install -y unzip
unzip awscliv2.zip
./aws/install -i $HOME/.aws-cli -b $HOME/.local/bin
export PATH=$HOME/.local/bin:$PATH
echo 'export PATH=$HOME/.local/bin:$PATH' >> ~/.bashrc
source ~/.bashrc


useful commands to debug backend: 
/bin/bash /home/ubuntu/start_scaler.sh
uvicorn main:app --host 0.0.0.0 --port 8000
systemctl status fastapi.service

Then configuring AWS:
aws configure
paste in the two keys, region eu-west-2
default output as json

Also:
pip install boto3

Image generation now working and uploading to S3, next step is to replicate results on the second pod.

To get to MVP, these are the remaining steps:

1) set up Redis comms between both workers and queue
2) modify backend shell script and setup both systemd services (main and autoscaler)
3) set up startup scripts and services on the two workers
4) modify frontend code as needed
5) stress test to confirm system working as intended using multiple windows/devices

Made some modifications to the relevant main, worker, and autoscaler files. They're untested but the next step is to get comms working between the backend and the two pods. Didn't know you could explore API commands from the command line with 'dir' but it helped me set up the API - the documentation was terrible. The same command works for regular python libraries

getting Redis communication working between workers and queue:

first obvious step is to open the redis TCP port on both pods.
Then we open the same on the backend ec2

in redis config changes ip to zeros, disabled protected mode, and set a password

updated python files using redis to have password term
----
To check connection is successful run this from each of the pods:
redis-cli -h "18.132.136.177" -p 6379 -a "redis123" ping

I then created a .service file so autoscaler also starts on system startup, then ran the normal commands

sudo systemctl daemon-reexec
sudo systemctl daemon-reload
sudo systemctl enable main-backend.service autoscaler.service
sudo systemctl start main-backend.service autoscaler.service

I then modified the shell script to add this line 'python3 autoscaler.py &'
The '&' allows it to run in the background

On both pods the worker script now runs and seems to be listening to the default Redis queue

Just dealing with the autoscaler needing the key - don't want to put it on git and system environment/OS variables aren't working, just going to put it in a text file in the root

Getting 'deserialisation' issues with redis, going to try removing password and removing the GPT suggested deserialisation argument

Getting 'Cannot re-initialize CUDA in forked subprocess. I tried a GPT-suggested solution, but after checking the Redis github pages, I think this 'simpleworker' might avoid the forking issue!

Swapped to SD1.5 as was having GPU memory issues.

I'm spending so many hours trying to get this frontend displaying the relevant S3. The GET succeeds, but console logs don't work around it for some reason. I've tried a lot of things, but I'm going to try this wrapping thing GPT says as a final option

Okay so the issue was multiple pm2 instances running which caused a very glitchy and old but stable build. This just highlights the usefulness of DevOps. What a pain. I now have MVP, but need to have a way to get these pods up and running quicker after they are stopped i.e. installing dependencies, setting up AWS credentials

Starting again on the project after a two-week pause to work on other stuff.

The remaining tasks are:

1) set up startup scripts and services on the two workers
2) set up retries on Redis Queue
2) stress test to confirm system working as intended using multiple windows/devices
3) set up Firebase Auth + SDK with relevant frontend changes
4) set up credits system with relevant frontend changes

Recreating the two runpods as they got deleted due to low funds.
Now I know everything needs to be in /workspace for it to be persistent when the pod is stopped.

- open tcp port 6379 on both runpods

Okay I found out runpod has a startup command for the container, so this can just run my shell script and no need to configure background daemons. All I do is setup the folder and AWS credentials and it should work.

So in the 'start command' for both this is the command:
bash /workspace/startup.sh && bash
So the above didn't allow the actual docker pod to run, so I figured out the start command actually has an 'exec' prefix automatically, and the start script is called script.sh.

Instead we do the below:
((exec)) bash -c 'bash /workspace/RollerAI/redis_vm/startup.sh && /start.sh'

we can't do /start.sh first as nothing will run after that. Also exec bash is fine but before we were techincally doing exec exec which isn't

What a pain - I think I need to be careful uploading poorly written shell scripts, it kind of 'soft-locks' the folder and I have to manually git pull to fix it. A basic shell script is working on startup for pod 1 now

Obviously I need to balance doing things manually versus scripting them.

Since the packages will be in workspace, we need a line in the script to ensure the PATH is set to the workspace area

- export PATH=/workspace/.local/bin:$PATH (this sets the PATH to the area where the packages are installed)

side note to install nano: apt update && apt install -y nano

FULL PROCESS FOR CONFIGURING A RUNPOD:

- up container volume to 30GB
- open TCP port 6379
- cd workspace, then clone the repo
- add a launch command 'bash -c 'bash /workspace/RollerAI/redis_vm/startup.sh && /start.sh'
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
apt update && apt install -y unzip
unzip awscliv2.zip
./aws/install -i /workspace/.aws-cli -b /workspace/.local/bin
python3 -m venv /workspace/venv
source /workspace/venv/bin/activate
pip install diffusers transformers accelerate torch boto3 redis rq
(the below are GPT suggestions to fix the uint issue - my old setup used to work)
pip install "transformers<4.40" "safetensors<0.4.0"
pip install --upgrade torch --index-url https://download.pytorch.org/whl/cu118
/workspace/.local/bin/aws configure and type in your keys
