Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GMS fails with NPE when mysql is not setup correctly #2639

Closed
dexter-mh-lee opened this issue Jun 3, 2021 · 11 comments
Closed

GMS fails with NPE when mysql is not setup correctly #2639

dexter-mh-lee opened this issue Jun 3, 2021 · 11 comments
Labels
bug Bug report devops PR or Issue related to DataHub backend & deployment

Comments

@dexter-mh-lee
Copy link
Contributor

Describe the bug
The following NPE is thrown while GMS instantiates if MySQL is not setup correctly (i.e. datahub db doesn't exist)
This error message gives no signal about what went wrong, making it very hard to debug and fix the issue.

ERROR ContextLoader Context initialization failed
 org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'dataProcessDAO' defined in com.linkedin.gms.factory.dataprocess.DataProcessDAOFactory: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.linkedin.metadata.dao.BaseLocalDAO]: Factory method 'createInstance' threw exception; nested exception is java.lang.NullPointerException
        at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:656)
        at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:484)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1338)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1177)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:557)
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:517)
        at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:323)
        at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222)

To Reproduce
Steps to reproduce the behavior:

  1. Create a MySQL container without the datahub db
  2. Run quickstart.sh
  3. Notice that gms fails to start with the above error message

Expected behavior
It should clearly indicate that the db has not been setup correctly, so that the operator can go an fix the db.

@dexter-mh-lee dexter-mh-lee added the bug Bug report label Jun 3, 2021
@KulykDmytro
Copy link
Contributor

KulykDmytro commented Jun 13, 2021

Facing similar issue when setup with provided helm chart (using prerequisites)
version: 0.8.1

@amorskoy
Copy link

amorskoy commented Jul 9, 2021

In addition, when I use docker/dev.sh on fresh git clone - periodically I get myslq-setup fail --> so mysql container does not have datahub db -> GMS fails.
Here are logs for mysql-setup:

2021/07/09 11:22:40 Waiting for: tcp://mysql:3306
2021/07/09 11:22:40 Problem with dial: dial tcp 172.18.0.5:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:22:41 Problem with dial: dial tcp 172.18.0.5:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:22:42 Problem with dial: dial tcp 172.18.0.5:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:22:43 Connected to tcp://mysql:3306
-- create datahub database
CREATE DATABASE IF NOT EXISTS datahub;
USE datahub;

-- create metadata aspect table
create table if not exists metadata_aspect_v2 (
  urn                           varchar(500) not null,
  aspect                        varchar(200) not null,
  version                       bigint(20) not null,
  metadata                      longtext not null,
  systemmetadata                longtext,
  createdon                     datetime(6) not null,
  createdby                     varchar(255) not null,
  createdfor                    varchar(255),
  constraint pk_metadata_aspect_v2 primary key (urn,aspect,version)
);

-- create default records for datahub user if not exists
CREATE TABLE temp_metadata_aspect_v2 LIKE metadata_aspect_v2;
INSERT INTO temp_metadata_aspect_v2 (urn, aspect, version, metadata, createdon, createdby) VALUES(
  'urn:li:corpuser:datahub',
  'corpUserInfo',
  0,
  '{"displayName":"Data Hub","active":true,"fullName":"Data Hub","email":"datahub@linkedin.com"}',
  now(),
  'urn:li:principal:datahub'
), (
  'urn:li:corpuser:datahub',
  'corpUserEditableInfo',
  0,
  '{"skills":[],"teams":[],"pictureLink":"https://raw.githubusercontent.com/linkedin/datahub/master/datahub-web/packages/data-portal/public/assets/images/default_avatar.png"}',
  now(),
  'urn:li:principal:datahub'
);
-- only add default records if metadata_aspect is empty
INSERT INTO metadata_aspect_v2
SELECT * FROM temp_metadata_aspect_v2
WHERE NOT EXISTS (SELECT * from metadata_aspect_v2);
DROP TABLE temp_metadata_aspect_v2;

-- create metadata index table
CREATE TABLE IF NOT EXISTS metadata_index (
 `id` BIGINT NOT NULL AUTO_INCREMENT,
 `urn` VARCHAR(200) NOT NULL,
 `aspect` VARCHAR(150) NOT NULL,
 `path` VARCHAR(150) NOT NULL,
 `longVal` BIGINT,
 `stringVal` VARCHAR(200),
 `doubleVal` DOUBLE,
 CONSTRAINT id_pk PRIMARY KEY (id),
 INDEX longIndex (`urn`,`aspect`,`path`,`longVal`),
 INDEX stringIndex (`urn`,`aspect`,`path`,`stringVal`),
 INDEX doubleIndex (`urn`,`aspect`,`path`,`doubleVal`)
);
ERROR 1130 (HY000): Host '172.18.0.8' is not allowed to connect to this MySQL server
2021/07/09 11:22:43 Command exited with error: exit status 1
2021/07/09 11:22:43 Command exited with error: exit status 1

@amorskoy
Copy link

amorskoy commented Jul 9, 2021

And, in contradiction, to demonstrate the flaky nature of an effect - after couple of ./nuke.sh && docker.sh, I've got Mysql and GMS finally started

2021/07/09 11:52:44 Waiting for: tcp://mysql:3306
2021/07/09 11:52:44 Problem with dial: dial tcp 172.20.0.2:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:52:45 Problem with dial: dial tcp 172.20.0.2:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:52:46 Problem with dial: dial tcp 172.20.0.2:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:52:47 Problem with dial: dial tcp 172.20.0.2:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:52:48 Problem with dial: dial tcp 172.20.0.2:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:52:49 Problem with dial: dial tcp 172.20.0.2:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:52:50 Problem with dial: dial tcp 172.20.0.2:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:52:51 Problem with dial: dial tcp 172.20.0.2:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:52:52 Problem with dial: dial tcp 172.20.0.2:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:52:53 Problem with dial: dial tcp 172.20.0.2:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:52:54 Problem with dial: dial tcp 172.20.0.2:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:52:55 Problem with dial: dial tcp 172.20.0.2:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:52:56 Problem with dial: dial tcp 172.20.0.2:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:52:57 Problem with dial: dial tcp 172.20.0.2:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:52:58 Problem with dial: dial tcp 172.20.0.2:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:52:59 Problem with dial: dial tcp 172.20.0.2:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:53:00 Problem with dial: dial tcp 172.20.0.2:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:53:01 Problem with dial: dial tcp 172.20.0.2:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:53:02 Problem with dial: dial tcp 172.20.0.2:3306: getsockopt: connection refused. Sleeping 1s
2021/07/09 11:53:03 Connected to tcp://mysql:3306
-- create datahub database
CREATE DATABASE IF NOT EXISTS datahub;
USE datahub;

-- create metadata aspect table
create table if not exists metadata_aspect_v2 (
  urn                           varchar(500) not null,
  aspect                        varchar(200) not null,
  version                       bigint(20) not null,
  metadata                      longtext not null,
  systemmetadata                longtext,
  createdon                     datetime(6) not null,
  createdby                     varchar(255) not null,
  createdfor                    varchar(255),
  constraint pk_metadata_aspect_v2 primary key (urn,aspect,version)
);

-- create default records for datahub user if not exists
CREATE TABLE temp_metadata_aspect_v2 LIKE metadata_aspect_v2;
INSERT INTO temp_metadata_aspect_v2 (urn, aspect, version, metadata, createdon, createdby) VALUES(
  'urn:li:corpuser:datahub',
  'corpUserInfo',
  0,
  '{"displayName":"Data Hub","active":true,"fullName":"Data Hub","email":"datahub@linkedin.com"}',
  now(),
  'urn:li:principal:datahub'
), (
  'urn:li:corpuser:datahub',
  'corpUserEditableInfo',
  0,
  '{"skills":[],"teams":[],"pictureLink":"https://raw.githubusercontent.com/linkedin/datahub/master/datahub-web/packages/data-portal/public/assets/images/default_avatar.png"}',
  now(),
  'urn:li:principal:datahub'
);
-- only add default records if metadata_aspect is empty
INSERT INTO metadata_aspect_v2
SELECT * FROM temp_metadata_aspect_v2
WHERE NOT EXISTS (SELECT * from metadata_aspect_v2);
DROP TABLE temp_metadata_aspect_v2;

-- create metadata index table
CREATE TABLE IF NOT EXISTS metadata_index (
 `id` BIGINT NOT NULL AUTO_INCREMENT,
 `urn` VARCHAR(200) NOT NULL,
 `aspect` VARCHAR(150) NOT NULL,
 `path` VARCHAR(150) NOT NULL,
 `longVal` BIGINT,
 `stringVal` VARCHAR(200),
 `doubleVal` DOUBLE,
 CONSTRAINT id_pk PRIMARY KEY (id),
 INDEX longIndex (`urn`,`aspect`,`path`,`longVal`),
 INDEX stringIndex (`urn`,`aspect`,`path`,`stringVal`),
 INDEX doubleIndex (`urn`,`aspect`,`path`,`doubleVal`)
);
2021/07/09 11:53:03 Command finished successfully.
2021/07/09 11:53:03 Command finished successfully.

@jjoyce0510
Copy link
Collaborator

"ERROR 1130 (HY000): Host '172.18.0.8' is not allowed to connect to this MySQL server" --> This is interesting. Seems that MySQL container is somehow denied from accessing MySQL, as opposed to MySQL being down. Can you tell if that container is up when you see this issue?

How are you folks deploying DataHub? Docker compose, helm, etc?

@amorskoy
Copy link

"ERROR 1130 (HY000): Host '172.18.0.8' is not allowed to connect to this MySQL server" --> This is interesting. Seems that MySQL container is somehow denied from accessing MySQL, as opposed to MySQL being down. Can you tell if that container is up when you see this issue?

How are you folks deploying DataHub? Docker compose, helm, etc?

In my case it is just docker-compose run:

./gradlew build
cd docker
./dev.sh

MySQL is being running during that error, how evere there is no datahub database inside. And just to remind, this is pretty flacky. When I perform "./nuke.sh" and then "./dev.sh" it runs successfully after 1-2 tries

@amorskoy
Copy link

@jjoyce0510 I believe I have working hypothesys on this flake, it is connected with docker/docker-compose.override.yml:

services:
  mysql:
    container_name: mysql
    hostname: mysql
    image: mysql:5.7
    env_file: mysql/env/docker.env
    command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
    ports:
      - "3306:3306"
    volumes:
      - ./mysql/init.sql:/docker-entrypoint-initdb.d/init.sql
      - mysqldata:/var/lib/mysql

I may suppose, that when running docker/dev.sh it is not always guaranteed that override will be picked up for mysql service instead if pulling and starting container based on default mysql image without overrides. Looks like docker-compose bug maybe?

@amorskoy
Copy link

@jjoyce0510 just for history. It is not a fix, just pretends to be a fix. To avoid this override flackiness, I edited docker/dev.sh to run on merged compose config:

docker-compose -f docker-compose.yml \
  -f docker-compose.override.yml \
  -f docker-compose.dev.yml config --no-interpolate > docker-compose.merged.yml
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
cd $DIR && \
  COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub \
    -f docker-compose.merged.yml \
    up --build $@

@anshbansal anshbansal added the devops PR or Issue related to DataHub backend & deployment label Jul 25, 2022
@jjoyce0510
Copy link
Collaborator

Thank you guys for the context!

@jjoyce0510
Copy link
Collaborator

@amorskoy Do you see any clear path to pushing a workaround back to the original dev.sh script? It might be useful for others as well

@amorskoy
Copy link

Hi @jjoyce0510. Sorry, I have lost the context as I belong to another company/project this year - there is no way for me to restore all this, sorry ((
But basically this post describes 100% of my local fix:

  • merge all docker-compose into one (to eliminate override buggy logic at runtime)
  • run DataHub using this pre-merged docker-compose.

My way is naive - I have inlined merge command into dev.sh, so merging each time. But it worked fine as I recall.

A better way would be to understand, why overrides were not always picked by the docker-compose tool. I tried that, but the community had no clear workaround.

@anshbansal
Copy link
Collaborator

As this is a very old issue which we don't have full context on now I'll close it. If this issue is still happening in latest releases we would love to have someone with full context add the details so we can work on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report devops PR or Issue related to DataHub backend & deployment
Projects
None yet
Development

No branches or pull requests

5 participants