Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] Use-cases]: Monitoring OPC UA with Netdata #562

Open
shyamvalsan opened this issue Aug 31, 2022 · 14 comments
Open

[Feat] Use-cases]: Monitoring OPC UA with Netdata #562

shyamvalsan opened this issue Aug 31, 2022 · 14 comments

Comments

@shyamvalsan
Copy link

shyamvalsan commented Aug 31, 2022

Problem

  • Netdata cannot currently monitor OPC UA servers or related metrics (tags)
  • Current solutions for monitoring health, performance and usage of industrial automation systems are rigid and difficult to manage

Description

OPC UA is an open, industry independent, secure connectivity framework for industrial automation data. OPC UA is designed for use across industries for myriad customers across various industrial sectors.

Industrial plants have a large variety of machines and sensors which need to be monitored for safety, maintenance and operational efficiency. Easy and efficient access to this data will improve the R&D efficiency of the companies operating these plants by a considerable factor. Maintenance teams will be able to develop more efficient maintenance plans and Process engineers will be able to optimize their production lines, also ML and AI use-cases will become feasible with access to high fidelity reliable monitoring data.

There should be a Netdata collector that can connect to OPC UA server(s) and collect all the associated metric information (tags) from it.

Here's some useful links to get started:

Importance

really want

Value proposition

  1. Opens up a new market niche for Netdata - there are thousands of companies who operate industrial automation systems/PLCs and if Netdata can offer a simple, flexible, feature rich and cost effective way to monitor these systems/machines there is a potential for a lot of connected nodes in the future.
@shyamvalsan
Copy link
Author

This feature was requested by a user, here's some feedback from a discussion I had with them.

  • Works as IS/IT coordinator in automation (large international manufacturer of heavy trucks), has many manufacturing plants, manufacturing components such as engines, gearboxes as well as assembly lines.
  • Has large amounts (~400) of CNC machines, heat treatment furnaces (lots of sensors for temperature, pressure atmosphere chemical composition, oil baths etc.) , robots, etc. Most equipment based on Siemens PLCs, though other brands exists of course.
  • Current tools to fetch and analyze machine process data too difficult to use and maintain. Eg: Kepware OPC proxy to send data to a data lake/database. In order to do this, we have to configure the Kepware proxy with each specific signal, data type and how and where to send it. This is something production engineers can't do themselves, but have to request from our internal IT department to do.
  • Trying to find better ways to provide actual machine data so that we can be much more agile. But also provide maintenance department with better information so they can do predictive/condition based maintenance instead of time/schedule based maintenance.
  • Goals are to be able to provide information to process engineers so they can optimize their production machines/lines and part quality, and provide maintenance department with enough information so they can develop much more efficient maintenance plans. Machine learning and AI needs as much information as possible to be effective.
  • Monitoring would have to be done remotely over ethernet. Need a node that can fetch the data and deliver to a parent. Possibly need several such nodes since there are several hundreds of machines and a machine can have 1000-30000 "tags" that could be monitored. The collector should be able to access several OPC-UA servers (machines), otherwise we'd need a swarm of collectors which would need more resources and would be harder to maintain as a infrastructure.

cc: @ktsaou @cakrit @sashwathn @amalkov @ralphm

@amalkov
Copy link

amalkov commented Aug 31, 2022

I believe this is a good opportunity to step in into the manufacturers ecosystems. The outcome of this work can be a paid support plan. It would be good to analyse the effort and implementation complexity.

Probably we just need to implement couple of collected and let it go, to be driven by the community, to validate the need.

@shyamvalsan
Copy link
Author

If we build the collector and have a guide to using it - we could test the waters by sharing it with https://www.reddit.com/r/PLC/ and see how the community receives it.

@ilyam8
Copy link
Member

ilyam8 commented Sep 6, 2022

@thiagoftsm can you share your thought before starting to implement something? atm I have 0 understanding of what OPC UA is and what the ways to collect metrics are, but I googled go opc ua and found https://github.com/gopcua/opcua.

@thiagoftsm
Copy link

@thiagoftsm can you share your thought before starting to implement something? atm I have 0 understanding of what OPC UA is and what the ways to collect metrics are, but I googled go opc ua and found https://github.com/gopcua/opcua.

Thank you for the link @ilyam8 ! As soon I finish eBPF stuff I am doing right now, I will share data and details about what we can do 🤝 .

@thiagoftsm
Copy link

thiagoftsm commented Sep 7, 2022

@shyamvalsan the Python examples you used are not async examples, instead we will have to use async version https://github.com/FreeOpcUa/opcua-asyncio of OPC UA.

I know we will write with go, I am only calling attention that OPC servers have two modes.

@thiagoftsm
Copy link

@shyamvalsan about the OPC UA metrics, it looks like that to get everything from the server is not recommended, because protocol was not designed for this, as you can see here, and here.

@thiagoftsm
Copy link

thiagoftsm commented Sep 12, 2022

Hello,

Last week I finished the work with python to understand how OPC UA works (Server, client, protocol). This week I am shifting to go, because python library exposed in OP has limitations that do not allow us to get all metrics we need, and of course the plugin with be written with other library.

During the python development I observed that:

  • Number of metrics we can collect is huge according documentation, and documentation is not showing all possible namespaces that user can have.
  • Metrics are delivered like dictionary as you can see in this example.
  • @shyamvalsan right now I am considering that we will use some values from namespace = 0 and probably we will use more metrics from namespaces with ids two or higher, if somehow users wanna collect everything probably we would need a dashboard per PLC, because some Siemens PLC has 30000 values.
  • There few issues to be addressed before to start development:
    • Collect real data (I will get this with our user).
    • How are we going to organize namespaces on dashboard?
    • We cannot assume that all servers will have the same values set in namespace zero, but we know the variables that can be there, what are the values we are going to plot?
    • Netdata cannot be installed on all hardware that run OPC servers, how are we going to organize data collected from different PLC?
    • @stelfrag is there any prevision to remove the current limitation to store and retrieve thousands of metrics from our database?

Best regards!

@shyamvalsan
Copy link
Author

@thiagoftsm

  • Regarding namespaces it appears only 0 and 1 can be known in advance by Netdata. The rest is up to the user to configure if they want to monitor.

image

I was thinking that namespaces should be correlated to jobs, so that each namespace will have a separate section in Netdata to themselves.

  • Regarding collecting everything and whether will be too many metrics, is 30000 the list of possible values or list of actual useful values that a user would want to monitor continually? I think the agent should figure out a way to ignore constant metrics or empty "tags" and that in practice the number may be lower per PLC (but this is just a hypothesis of mine and could be wrong)

  • IMO multiple PLC should be treated as separate instances and data coming from them should be aggregated under a namespace on composite charts.

@thiagoftsm
Copy link

@shyamvalsan after I discuss with users your points I will bring another update.

@thiagoftsm
Copy link

During the tests I reach a OPC UA server that does not allow to query all Nodes, considering this scenario the safest option looks like to query IDS that are always present. The whole list is present in this link with prefix UA_NS0ID.

@thiagoftsm
Copy link

Since last message I ran different tests with different OPC servers and a specific PLC emulator developed by microsoft, for this last I was running it with following arguments:

docker run --rm -it -p 50000:50000 -p 8080:8080 --name opcplc mcr.microsoft.com/iotedge/opc-plc:latest --pn=50000 --autoaccept --sph --sn=5 --sr=10 --st=uint --fn=5 --fr=1 --ft=uint --ctb --scn --lid --lsn --ref --gn=5 --ut --aa --to

When I requested all variables for the microsoft PLC I got this result using the python library, because GO client does not allow me to connect with any server to require all nodes (ns0;i=84):

bash-5.1$ go run examples/read/read.go -endpoint opc.tcp://localhost:50000 -node 'ns=0;i=84'
Status not OK: The attribute is not supported for the specified Node. StatusBadAttributeIDInvalid (0x80350000)

As we discussed in our meeting, I am going to send an e-mail for our user requesting a real environment to test, and I will also report the issue in gopcua repo,

@Forza-tng
Copy link

Hi, I just wanted to chime in on the value proposition. At my company we have lots and lots of UA capable devices (CNC machines, robots, heat treatment, and other manufacturing equipment), and although there are plenty of comersial tools to gather data off these, they are usually focused to drive MES and ERP systems, or gather specific data.

The standard tools work good when we know exactly what data/signals we need. Then it is a matter of selecting the correct source and sending it to the right recipient/system.

My interest here is to find better ways to broadly gather data, visualise how it (the data) looks like, and provide ways to quickly look through thousands of signals/data sources. Netdata is very capable and can easilly graph throusands of metrics in an easy to use interface.

My goals are several.

  • Provide engineers easy access to their machines' data so they can check performance and quality. This data is important from a warranty point of view, but also from a capability and performance point of view
  • Provide maintenance departments with metrics that can help them change from mostly static maintenance scheduling to condition based matenance.
  • Provide detailed metrics for data scientists's use. There are many universities that we work with that want to do research in the automation field, but lack of data makes many research projects difficult to perform.
  • Develop machine learning tools that can better predict the resulting quality on produced parts based on the machinetool status.
  • .. and more =)

@thiagoftsm
Copy link

@ilyam8 and @shyamvalsan I am adding here an example from a Demo IOT environment. As you can see the majority of the metrics are not defined and we won't use them.

Right now my expectations are that in a real environment, metrics not related to server will be listed with a different namespace(ns=2 or higher) and we will focus our collection on them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants