Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connections to google/apple not closed #67

Closed
honzasmuk opened this issue Dec 4, 2017 · 5 comments
Closed

connections to google/apple not closed #67

honzasmuk opened this issue Dec 4, 2017 · 5 comments
Assignees
Milestone

Comments

@honzasmuk
Copy link

We tried to loadtest our backend and pushserver. We were sending 1500 individual pushes (/message/send) at rate of 20/s.
At first it looked fine then in about 1/2 of load data the push server started to fail with errors:

04-Dec-2017 11:25:06.605 SEVERE [http-nio-18066-Acceptor-0] org.apache.tomcat.util.net.NioEndpoint$Acceptor.run Socket accept failed
 java.io.IOException: Too many open files
	at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
	at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
	at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
	at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:682)
	at java.lang.Thread.run(Thread.java:748)

and:

java.lang.NoClassDefFoundError: io/getlime/core/rest/model/base/entity/Error
(but this is probably only a consequence of the Too many open files error)

Number of file descriptors are indeed close to our linux max:

ls -l /proc/17510/fd | wc -l
4095

It looks like the cause is one of following:

  1. connections to google or apple are not properly closed
  2. we were sending requests in faster rate than the push server was able to process. But still after we ended the load test I would expect the push server to clean resources. But it was still holding full number of open connections.
@petrdvorak petrdvorak self-assigned this Dec 4, 2017
@petrdvorak petrdvorak added this to the 0.18.0 milestone Dec 4, 2017
@petrdvorak
Copy link
Member

Couple additional questions:

  • Are you using the latest Push Server version (0.17.0)? We have fixed several issues on our end and updated the Pushy library.
  • Are the resources really never released, even after about a minute? We would like to make sure that there are no connections waiting for timeout.
  • Are you using proxy configuration to communicate with APNS/FCM?

@honzasmuk
Copy link
Author

honzasmuk commented Dec 5, 2017

No, we are using 0.16.0. As I look into the code the PushSenderService creates new apn client for every call of /send endpoint:
https://github.com/lime-company/powerauth-push-server/blob/0.16.0-beta/powerauth-push-server/src/main/java/io/getlime/push/service/PushSenderService.java#L172

I think this may be the cause because I cannot see closing the apn client anywhere. Maybe there is the same problem for fcm.

Version 0.17.0 seems different - it holds single instance of apn/fcm for application id. We have to try it.

@petrdvorak
Copy link
Member

Yes. The documentation for underlying library was improved this October by adding the Pushy - Best practices page, hence we updated the implementation based on it.

Please let's try 0.17.0. Please mind that the SQL schema has changed - new tables were added and field names became more consistent.

@petrdvorak
Copy link
Member

petrdvorak commented Dec 5, 2017

Here is the difference script between 0.16.0 and 0.17.0:

--- create new sequences
CREATE SEQUENCE PUSH_CAMPAIGN_SEQ START WITH 1 INCREMENT BY 1;
CREATE SEQUENCE PUSH_CAMPAIGN_USER_SEQ START WITH 1 INCREMENT BY 1;

--- create new tables
CREATE TABLE PUSH_CAMPAIGN (
  ID NUMBER(19) PRIMARY KEY NOT NULL,
  APP_ID NUMBER(19) NOT NULL,
  MESSAGE VARCHAR(4000) NOT NULL,
  IS_SENT NUMBER(1) DEFAULT 0,
  TIMESTAMP_CREATED TIMESTAMP(6) NOT NULL,
  TIMESTAMP_SENT  TIMESTAMP(6),
  TIMESTAMP_COMPLETED  TIMESTAMP(6)
);

CREATE TABLE PUSH_CAMPAIGN_USER (
  ID NUMBER(19) PRIMARY KEY NOT NULL,
  CAMPAIGN_ID NUMBER(19) NOT NULL,
  USER_ID NUMBER(19) NOT NULL,
  TIMESTAMP_CREATED TIMESTAMP(6) NOT NULL
);

--- alter existing tables
ALTER TABLE POWERAUTH_PUSH_SERVER.PUSH_DEVICE_REGISTRATION RENAME COLUMN LAST_REGISTERED TO TIMESTAMP_LAST_REGISTERED;
ALTER TABLE POWERAUTH_PUSH_SERVER.PUSH_MESSAGE RENAME COLUMN SILENT TO IS_SILENT;
ALTER TABLE POWERAUTH_PUSH_SERVER.PUSH_MESSAGE RENAME COLUMN PERSONAL TO IS_PERSONAL;
ALTER TABLE POWERAUTH_PUSH_SERVER.PUSH_MESSAGE RENAME COLUMN ENCRYPTED TO IS_ENCRYPTED;

Also, make sure there is sequence PUSH_DEVICE_REGISTRATION_SEQ, not PUSH_REGISTRATION_SEQ.

@petrdvorak
Copy link
Member

This issue will very likely be fixed in 0.17.0, where we fixed the logic for creating / caching APNS and FCM clients for various applications.

If the issue persists, please feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants