Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploying on Rancher #357

Open
David-Development opened this issue Sep 14, 2018 · 10 comments
Open

Deploying on Rancher #357

David-Development opened this issue Sep 14, 2018 · 10 comments
Projects

Comments

@David-Development
Copy link

Issue:

When deploying Reana-Cluster onto a Rancher Kubernetes Cluster, I'm running into some certificate issues. Kubectl, on the other hand, still works without problems.

...
HTTPSConnectionPool(host='192.168.1.10', port=8443): Max retries exceeded with url: 
/k8s/clusters/c-rqbzb/api/v1/namespaces/default/secrets?includeUninitialized=false
(Caused by SSLError(CertificateError("hostname '192.168.1.10' doesn't match '192.168.1.10'",),))

Rancher is using port 8443, k8s API is available at (https://192.168.1.10:8443/k8s/clusters/c-rqbzb). I am able to access the url https://192.168.1.10:8443/k8s/clusters/c-rqbzb/api/v1/namespaces/default/secrets in my browser.
The certificate for rancher is auto-generated (self-signed). Could this be the problem? Btw. my kube-config file contains the certificate-authority-data section. Kubectl is not complaining about any ssl issues.

I'm trying to start my Reana-Cluster with the following command:

openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
    -keyout /tmp/tls.key -out /tmp/tls.crt \
    -subj "/CN=192.168.1.10"

./kubectl delete secrets reana-ssl-secrets
./kubectl create secret tls reana-ssl-secrets \
      --key /tmp/tls.key --cert /tmp/tls.crt

reana-cluster init # <-- exception occurs here

Steps to reproduce:

  1. Run Rancher-UI
# run rancher
docker run -d --name=rancher --restart=unless-stopped -p 8080:80 -p 8443:443 rancher/rancher:v2.0.8
  1. login (https://localhost:8443), create a new cluster ("custom") --> leave default settings, just click on "next"
  2. make sure to check "etcd", "Control Plane" and "Worker"
  3. copy generated output command into cli
  4. wait until cluster is initialized, click on "Kubeconfig file" and place the content into ~/.kube/config
  5. run commands shown in the issue section
@diegodelemos
Copy link
Member

Hello @David-Development, first of all, sorry for the late reply... I have managed to deploy REANA on Rancher following your steps. I've taken the Kubernetes configuration from Rancher UI and copied it over to ~/.kube/config.

screenshot 2018-10-18 at 15 22 27

And it looks more or less like this:

apiVersion: v1
kind: Config
clusters:
- name: "reana"
  cluster:
    server: "https://localhost:8443/k8s/clusters/c-x77qs"
    api-version: v1
    certificate-authority-data: "~~~~~~~"

users:
- name: "user-~~~~"
  user:
    token: "~~~~~~~~~~~~"

contexts:
- name: "reana"
  context:
    user: "user-~~~~"
    cluster: "reana"

current-context: "reana"

Right after I just run reana-cluster init and all components are initialised correctly.

screenshot 2018-10-18 at 15 05 39

Regarding accessing the services from outside the cluster, I have tried getting the address reserved for the reana-server component from the UI and curl but I get a timeout:

$ curl http://192.168.65.3:32121/
curl: (7) Failed to connect to 192.168.65.3 port 32121: Operation timed out

This seems to be a problem that could be solved with some Rancher experience, did you manage to have it working?

@diegodelemos
Copy link
Member

Something important to notice which I have forgotten before, you should use reana-cluster in this version if you are using REANA 0.3.0.

Regarding fully running REANA inside Rancher, as a workaround for the issue of not being able to access services from outside the cluster and to make sure that things are working I have run the reana-client inside the cluster as follows:

  1. Login into the reana-server component:
$ kubectl exec -ti server-657b47685b-ltm8d bash
>
  1. Install reana-client and configure it, to retrieve the access token you can use reana-cluster env --include-admin-token.
> pip install reana-client
> export REANA_SERVER_URL=http://localhost:5000
> export REANA_ACCESS_TOKEN=FIXME
  1. And then clone locally the hello world example and run it.
> cd /tmp/
> git clone https://github.com/reanahub/reana-demo-helloworld
> cd reana-demo-helloworld/
> reana-client create
> export REANA_WORKON=workflow.2
> reana-client upload
> reana-client start
> reana-client status
> reana-client download
> cat results/greetings.txt

@David-Development
Copy link
Author

@diegodelemos Thank you for your help! I ran the script again today (using the latest version) and the problem was gone. I think the "timeout" occurs because the service is not running on that port anymore? On my cluster the service was migrate from one node to another a couple of times. The port changed every time. I wrote a small script to automate the connect call (I'm using it in a Docker container). Maybe this will be helpful for someone.

KUBE_REANA_POD_NAME=$(kubectl get pods -l app=server -o=custom-columns=:.metadata.name | tr -d '\n')
KUBE_REANA_NODE_NAME=$(kubectl get pods -l app=server -o=custom-columns=:.spec.nodeName | tr -d '\n')
KUBE_REANA_SERVICE_PORT=$(kubectl get service server -n default -o=custom-columns=:.spec.ports[].nodePort | tr -d '\n')

# extract cluster-url and port
$(reana-cluster env --include-admin-token) > ./exports.sh
source ./exports.sh
export REANA_SERVER_URL=http://${KUBE_REANA_NODE_NAME}:${KUBE_REANA_SERVICE_PORT}/

echo "REANA_SERVER_URL: $REANA_SERVER_URL"
echo "REANA_ACCESS_TOKEN: $REANA_ACCESS_TOKEN"

# run sample workflow
WORKFLOW_NAME="helloworld-`date +%s`"
git clone https://github.com/reanahub/reana-demo-helloworld
cd reana-demo-helloworld/

reana-client create --name ${WORKFLOW_NAME} --skip-validation
export REANA_WORKON=${WORKFLOW_NAME}
reana-client upload
reana-client start
reana-client status
reana-client status
reana-client download
cat results/greetings.txt

As I can't use an Ingress Controller, setting a hostPort would be useful to me. However updating the service deployed by reana does not work. kubectl says: service/server patched (no change). But it isn't changed on the cluster. Below you can find the command I used to set a hostPort. The second command is to deploy the reana service on the master node (so that the IP doesn't change all the time). Any ideas how to work around this? Even tried to set this in the Kubernetes Dashboard but as soon as I hit "update", the old config is back.

kubectl patch service server -n default --type='json' -p='[{"op": "add", "path": "/spec/ports/0/hostPort", "value": 54321}]'
kubectl patch service server -n default --type='json' -p='[{"op": "add", "path": "/spec/nodeSelector", "value": { "node-role.kubernetes.io/etcd": "true" }}]'

@jordidem
Copy link

Dear David, Diego,

We were following your instructions in order to deploy the reana-cluster on a Rancher Kubernetes Cluster (Rancher version v2.2.4, kubernetes version v1.13.5, Reana cluster 0.5.0, python 2.7).
[root@reana-server-test ~]# helm version
Client: &version.Version{SemVer:"v2.13.0", GitCommit:"79d07943b03aea2b76c12644b4b54733bc5958d6", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.13.0", GitCommit:"79d07943b03aea2b76c12644b4b54733bc5958d6", GitTreeState:"clean"}

  • Tiller version v2.13.0

So this were our steps:

  1. Launch Rancher
    docker run -d --name=rancher --restart=unless-stopped -p 8080:80 -p 8443:443 rancher/rancher:latest
  2. login (https://localhost:8443), create a new cluster ("custom") --> leave default settings, just click on "next"
  3. make sure to check "etcd", "Control Plane" and "Worker"
  4. copy generated output command into cli
  5. wait until cluster is initialized, click on "Kubeconfig file" and place the content into ~/.kube/config
  6. openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /tmp/tls.key -out /tmp/tls.crt -subj "/CN=our IP"
  7. ./kubectl delete secrets reana-ssl-secrets
  8. ./kubectl create secret tls reana-ssl-secrets --key /tmp/tls.key --cert /tmp/tls.crt
  9. We create & activate the virtual environment picreana
  10. we install reana cluster doing "pip install reana-cluster"
  11. We have updated the file reana-cluster.yaml changing only two lines (version and reana_url) from the provided
    cluster:
    type: "kubernetes"

version: "v1.14.0"

version: "v1.13.5"
db_config: &db_base_config
- REANA_SQLALCHEMY_DATABASE_URI: "postgresql+psycopg2://reana:reana@db:5432/reana"
root_path: "/var/reana"
shared_volume_path: "/var/reana"
#reana_url: "reana-dev.cern.ch"
reana_url: "reana-server-test.pic.es"

  1. Then we run reana-cluster --debug -f reana-cluster.yaml init --traefik and we are stuck with the following error:
    (picreana) [root@reana-server-test configurations]# reana-cluster --debug -f reana-cluster.yaml init --traefik
    [ERROR] Got an unexpected keyword argument 'include_uninitialized' to method list_namespaced_secret
    Traceback (most recent call last):
    File "/root/.virtualenvs/picreana/lib/python2.7/site-packages/reana_cluster/cli/cluster.py", line 162, in init
    backend.init(traefik)
    File "/root/.virtualenvs/picreana/lib/python2.7/site-packages/reana_cluster/backends/kubernetes/k8s.py", line 350, in init
    manifest)
    File "/root/.virtualenvs/picreana/lib/python2.7/site-packages/reana_cluster/backends/kubernetes/k8s.py", line 464, in _add_service_acc_key_to_component
    'default', include_uninitialized='false')
    File "/root/.virtualenvs/picreana/lib/python2.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 12884, in list_namespaced_secret
    (data) = self.list_namespaced_secret_with_http_info(namespace, **kwargs)
    File "/root/.virtualenvs/picreana/lib/python2.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 12921, in list_namespaced_secret_with_http_info
    " to method list_namespaced_secret" % key
    TypeError: Got an unexpected keyword argument 'include_uninitialized' to method list_namespaced_secret

Do you know what's happening? Any help would be very appreciated!
Many thanks in advance,

@roksys
Copy link
Contributor

roksys commented Jul 17, 2019

Hi @jordidem,

REANA-Cluster 0.5.0 has no upper version limit for Kubernetes package - https://github.com/reanahub/reana-cluster/blob/14702db6d579cc2d31c56fcfe4ce73aded1bd7d0/setup.py#L55
So I guess you got Kubernetes 10 installed in your virtualenv, which is incompatible with REANA-Cluster 0.5.0.

The easiest and fastest possible fix would downgrade Kubernetes version in your virtualenv.
$ pip install pip install kubernetes==9.

@jordidem
Copy link

jordidem commented Jul 17, 2019 via email

@tiborsimko
Copy link
Member

@roksys @diegodelemos We should perhaps revive the topic of pinning all dependencies and use something like PyUp to help with a periodical upgrade schedule.

@jordidem
Copy link

jordidem commented Jul 25, 2019

Dear all,

Its Jordi again. we are still trying to make the deploy of reana on rancher and we are facing additional problems. This is our Rancher deploy
image

This are the versions of the
(picreana) [root@reana-server-test configurations]# kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:53:57Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

These are our main problems to solve:

  1. The environment reana-cluster env is returning None as URL
    (picreana) [root@reana-server-test configurations]# reana-cluster env --include-admin-token
    export REANA_SERVER_URL=http://None:80
    export REANA_ACCESS_TOKEN=vetNHw...

Do you know what is happening?

  1. But ok, we force the value of the REANA_SERVER_URL and continue with the process to start the execution of the example
    (myreana) :/reana-client/run$ export REANA_SERVER_URL=http://192.168.96.xxx:31241/
    (myreana) :
    /reana-client/run$ export REANA_ACCESS_TOKEN=vetNHw....

(myreana) :~/reana-client/run$ reana-client ping
Connected to http://192.168.96.xxx:31241/ - Server is running.

(myreana):/reana-client/run$export REANA_WORKON=test1
(myreana):
/reana-client/run$reana-client create -n test1
test1.2
(myreana):/reana-client/run$reana-client upload
File code/helloworld.py was successfully uploaded.
File data/names.txt was successfully uploaded.
(myreana):
/reana-client/run$ reana-client start
test1 is running
(myreana):/reana-client/run $ reana-client status
NAME RUN_NUMBER CREATED STATUS PROGRESS
test1 2 2019-07-25T14:19:16 running -/-
(myreana):
/reana-client/run $ reana-client download
File results/greetings.txt could not be downloaded: results/greetings.txt does not exist.

The task is not running and we get this output: PROGRESS -/-, so when we try to download the output it is not in the workspace...We don't understand why the task it's not running, we couldn't find any hint in the logs

  1. The last thing we are trying to understand is the initialization of the Ingress
    image
    So the initialization never ends

Thanks again for your help!

@tiborsimko
Copy link
Member

(1) Regarding None value for URL detection, please see reanahub/reana-cluster#73 there are some hints on how to look up the value if the current detection fails.

(2) For debugging running workflows, the best technique is to use kubectl get pods and kubectl logs on each pod to see what's happening. The status -/- means that the workflow status is unknown, perhaps the workflow has not started or perhaps the status cannot be updated. We saw it happening in the past when there were network connection issues between pods. Finally, you can also run reana-client ls -w mytest to inspect the workspace of the mytest workflow to see any created files there.

(3) Regarding ingress, you may want to check Ben's write-up https://bengalewsky.github.io/openstack/reana/hep/scailfin/2019/01/24/ZeroToReanaOnOpenstack.html containing musings about installing REANA on non-CERN infrastructure. There are parts touching ingress which may be perhaps useful for your scenario.

@jordidem
Copy link

jordidem commented Sep 2, 2019

One of the main issues we had deploying the Reana Cluster using Rancher was related to the mounting paths for the job pods in the context of using REANA with the local disk on a virtualized server that is hosting the reana-server.

The issue caused the failure of the job because the mount point and the root path for the user/workflow workspace was not properly set. After asking for some help in the gitter chat, we managed to solve this issue with the following workaround extracted from comment of gitlawr on rancher/rancher#14836 . Literally we did the following steps

  1. Edit cluster, Edit as YAML
  2. Add the following flags for kubelusing Rancher et:

services:
kubelet:
extra_args:
containerized: "true"
extra_binds:
- "/:/rootfs:rshared"

  1. Click save and wait till the cluster is updated.

Notes:
The community is planning to deprecate the "--containerized" for kubelet(kubernetes/kubernetes#74148).
But the flag is essential for the capability as there is no alternative at the moment.

@diegodelemos diegodelemos changed the title cli: cluster init fails due to "Certificate did not match expected hostname" Deploying on Rancher Jul 29, 2020
@diegodelemos diegodelemos transferred this issue from reanahub/reana-cluster Jul 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Triage
Cluster future
Development

No branches or pull requests

5 participants