Skip to content
This repository was archived by the owner on Mar 31, 2023. It is now read-only.

Conversation

@chenpiaoping
Copy link
Contributor

@chenpiaoping chenpiaoping commented Jul 17, 2020

This PR includes the following features:

  1. Refactoring port-manager.
  2. Vpc supports batch acquisition.
  3. Subnet supports batch acquisition.
  4. Mac address supports batch creation.
  5. Ip address supports batch creation.
  6. Security groups support batch acquisition and binding/unbinding.

@chenpiaoping chenpiaoping force-pushed the port-manager branch 2 times, most recently from 451e09a to d578881 Compare July 17, 2020 10:08
Copy link
Contributor

@xieus xieus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some early comments. I will continue to review the rest.

if (startTimes.containsKey(id)) {
startTime = startTimes.get(id);
} else {
startTime = endTime;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this give us some false positive information that the duration is zero? :-)

@PostMapping("/ips")
@ResponseBody
@ResponseStatus(HttpStatus.CREATED)
@DurationStatistics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good annotation. Consider to apply it to all other services as well 👍

@xieus xieus added the refactor Code refactor and improvement label Jul 17, 2020
@xieus xieus added this to the Version 0.7.2020.07.30 milestone Jul 17, 2020
@chenpiaoping chenpiaoping force-pushed the port-manager branch 2 times, most recently from 13d5c8d to f752478 Compare July 20, 2020 01:55
@chenpiaoping
Copy link
Contributor Author

image

@chenpiaoping
Copy link
Contributor Author

chenpiaoping commented Jul 20, 2020

After optimization, the performance data for creating a single port is as follows:
Samples: 4877
Average: 10
Media: 10
90% Line: 12 (90% of the samples took no more than 12)
95% Line: 13 (93% of the samples took no more than 13)
99% Line: 17 (99% of the samples took no more than 17)
Min: 7
Maximum: 221
Error %: 0
Throughput: 81.3/sec
Received KB/sec: 95.60
Sent KB/sec: 42.31

Port configuration:
{
"port": {
"network_id": "5687abe4-9ae3-446a-979d-04d7e486646c",
"tenant_id": "3d53801c-32ce-4e97-9572-bb966f476ec",
"allowed_address_pairs": [{
"ip_address": "11.11.11.101",
"mac_address": "00:01:6C:06:A6:29"
}]
}
}

@codecov-commenter
Copy link

codecov-commenter commented Jul 20, 2020

Codecov Report

Merging #301 into master will decrease coverage by 3.40%.
The diff coverage is 47.81%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #301      +/-   ##
============================================
- Coverage     37.55%   34.15%   -3.41%     
- Complexity      550      859     +309     
============================================
  Files           241      353     +112     
  Lines          5680     8869    +3189     
  Branches        575     1078     +503     
============================================
+ Hits           2133     3029     +896     
- Misses         3292     5438    +2146     
- Partials        255      402     +147     
Impacted Files Coverage Δ Complexity Δ
...i/alcor/apigateway/AlcorApiGatewayApplication.java 33.33% <ø> (ø) 1.00 <0.00> (ø)
...om/futurewei/alcor/dataplane/DataPlaneManager.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...a/com/futurewei/alcor/dataplane/config/Config.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...aplane/config/grpc/GoalStateProvisionerClient.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...ne/config/serialization/GoalStateDeserializer.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...lane/config/serialization/GoalStateSerializer.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...rewei/alcor/dataplane/controller/GSController.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...alcor/dataplane/exception/ACAFailureException.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...alcor/dataplane/exception/DPMFailureException.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...aplane/service/impl/MizarGoalStateServiceImpl.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
... and 318 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8e72e78...5d034e9. Read the comment docs.

@xieus
Copy link
Contributor

xieus commented Jul 20, 2020

@chenpiaoping Some code conflict due to PR #304 that got just merged.

@xieus
Copy link
Contributor

xieus commented Jul 20, 2020

After optimization, the performance data for creating a single port is as follows:
Samples: 4877
Average: 10
Media: 10
90% Line: 12 (90% of the samples took no more than 12)
95% Line: 13 (93% of the samples took no more than 13)
99% Line: 17 (99% of the samples took no more than 17)
Min: 7
Maximum: 221
Error %: 0
Throughput: 81.3/sec
Received KB/sec: 95.60
Sent KB/sec: 42.31

Port configuration:
{
"port": {
"network_id": "5687abe4-9ae3-446a-979d-04d7e486646c",
"tenant_id": "3d53801c-32ce-4e97-9572-bb966f476ec",
"allowed_address_pairs": [{
"ip_address": "11.11.11.101",
"mac_address": "00:01:6C:06:A6:29"
}]
}
}

@chenpiaoping The result is outstanding. I will expedite the PR review and let us try to get the PR part of the release.

Couple of questions:

  • When we generate the test payload, do we automatically generate different configurations (uuid, ip address, mac address etc.) for different ports? This would impact performance as if we use the same configuration, we always hit the same record.
  • The size of samples is good to show some initial result. Can we increase to 10,000, 50,000, 100,000, and 500,000 to see if the latency and throughput curve of a single-node controller? Basically we want to find the breaking point.
  • Ignite: do we use one instance of DB, or multiple instances? Maybe that could be part of the metrics too.

Cheers!
@xieus

Copy link
Contributor

@xieus xieus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenpiaoping Some more comments, mostly minor.

I was wondering if we have a plan to recover IpAddrTest and IpRangeTest, which include good tests.

*/
package com.futurewei.alcor.portmanager.processor;

import org.slf4j.Logger;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use Alcor logger or you need something special from slf4j?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought alcor would use slf4j directly.

}

public String getVpcId() {
return vpcId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the @DaTa annotation not work here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As i know, @ Data is no longer recommended because it generates code during compilation and has Poor security and maintainability

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay thanks, let us remind the team of this annotation limit and start cleaning up the annotation when we see that.

import static com.futurewei.alcor.securitygroup.utils.RestParameterValidator.*;

@RestController
@ComponentScan(value = "com.futurewei.alcor.common.stats")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenpiaoping Can you test the SG APIs from API GW to see if anything is expected? Recently there is a minor change of List SG API introduced by PR ##304. The List SG contract was not fully compatible wit Neutron. Can you check others as well.

Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do it a separate/small PR if any change is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i will do that.

@chenpiaoping
Copy link
Contributor Author

When we generate the test payload, do we automatically generate different configurations (uuid, ip address, mac address etc.) for different ports? This would impact performance as if we use the same configuration, we always hit the same record.
> PM assigns random ip and mac address to ports that do not have ip and mac address
The size of samples is good to show some initial result. Can we increase to 10,000, 50,000, 100,000, and 500,000 to see if the latency and throughput curve of a single-node controller? Basically we want to find the breaking point.
>From the test result , we find that the QPS is about 81.3/sec. that is to say, it is about 5000 / min. If the rate of the sender exceeds this value, the average processing time of each request will increase.
Ignite: do we use one instance of DB, or multiple instances? Maybe that could be part of the metrics too.
> I deployed an ignite instance on my laptop. the only meaning of this test result is that it is much better than the previous test results in the same environment (my laptop).

@chenpiaoping chenpiaoping force-pushed the port-manager branch 6 times, most recently from ece36ae to 3197a81 Compare July 24, 2020 04:44
@chenpiaoping
Copy link
Contributor Author

@xieus
The problem of ignite deadlock, which I mentioned at the last regular meeting, appeared in the K8s environment:
"http-nio-8080-exec-6" #22 daemon prio=5 os_prio=0 cpu=70.00ms elapsed=635.62s tid=0x00007fddfe53d800 nid=0x1b runnable [0x00007fddb87a6000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(java.base@11.0.5/Native Method)
at java.net.SocketInputStream.socketRead(java.base@11.0.5/SocketInputStream.java:115)
at java.net.SocketInputStream.read(java.base@11.0.5/SocketInputStream.java:168)
at java.net.SocketInputStream.read(java.base@11.0.5/SocketInputStream.java:140)
at org.apache.ignite.internal.client.thin.TcpClientChannel$ByteCountingDataInput.read(TcpClientChannel.java:535)
at org.apache.ignite.internal.client.thin.TcpClientChannel$ByteCountingDataInput.readInt(TcpClientChannel.java:572)
at org.apache.ignite.internal.client.thin.TcpClientChannel.processNextResponse(TcpClientChannel.java:272)
at org.apache.ignite.internal.client.thin.TcpClientChannel.receive(TcpClientChannel.java:234)
at org.apache.ignite.internal.client.thin.TcpClientChannel.service(TcpClientChannel.java:171)
at org.apache.ignite.internal.client.thin.ReliableChannel.service(ReliableChannel.java:180)
at org.apache.ignite.internal.client.thin.GenericQueryPager.next(GenericQueryPager.java:71)
at org.apache.ignite.internal.client.thin.ClientQueryCursor$1.nextPage(ClientQueryCursor.java:93)
at org.apache.ignite.internal.client.thin.ClientQueryCursor$1.hasNext(ClientQueryCursor.java:76)
at org.apache.ignite.internal.client.thin.ClientQueryCursor.getAll(ClientQueryCursor.java:47)
at com.futurewei.alcor.common.db.ignite.IgniteClientDbCache.getAll(IgniteClientDbCache.java:182)
at com.futurewei.alcor.common.db.ignite.IgniteClientDbCache.getAll(IgniteClientDbCache.java:175)

@chenpiaoping chenpiaoping force-pushed the port-manager branch 2 times, most recently from c66982e to 8d593dc Compare July 27, 2020 02:13
@chenpiaoping
Copy link
Contributor Author

@xieus This PR is ready to be reviewed.

@chenpiaoping
Copy link
Contributor Author

The time cost of calling CreatePortBulk to create 50 port(Body contains 50 port) is as follows:
image

Copy link
Contributor

@xieus xieus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xieus This PR is ready to be reviewed.

Thanks @chenpiaoping Somehow I missed this message. Let me do the review this afternoon.

@@ -0,0 +1,9 @@
package com.futurewei.alcor.portmanager.processor;


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra line.

Copy link
Contributor

@xieus xieus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have Port Manager 2.0 now 👍

@chenpiaoping It may be a good idea for you to give some high-level introduction of new Port Manager design and implementation in the open-source meeting. This is a pretty decent implementation.

}

@Override
void createProcess(PortContext context) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This createProcess only applies to port with binding host id. How about those without binding host id?

import java.util.*;

@Component
public class ProcessorManager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give some comments to ProcessManager, AbstractProcess, PortProcess, and other types of processes etc.?

import java.util.concurrent.CompletableFuture;
import java.util.concurrent.CompletionException;

public class RequestManager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make some comments to RequestManager, and some other important requests as well.

@xieus xieus changed the title Refactoring Port Manager [Microservice] Port Manager 2.0 Implementation Aug 10, 2020
@xieus xieus merged commit 76707cf into futurewei-cloud:master Aug 10, 2020
@xieus xieus linked an issue Oct 13, 2020 that may be closed by this pull request
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

refactor Code refactor and improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Integration] Missing Neighbor Info in Cross-Node Scenario

4 participants