Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API addition: Support strings more cleanly across all PLC types #162

Closed
kyle-github opened this issue Jul 22, 2020 · 23 comments
Closed

API addition: Support strings more cleanly across all PLC types #162

kyle-github opened this issue Jul 22, 2020 · 23 comments

Comments

@kyle-github
Copy link
Member

Support strings as a "native" type of the library, in the API. For example:

const char *plc_tag_get_string(int32_t tag_id, int offset);
int plc_tag_set_string(int32_t tag_id, int offset, const char *string_val);

As with other calls, the offset parameter is in bytes. For the getter, return NULL on failure either due to lack of data, an out of bounds offset or other error.

Note that this will require some sort of internal byte swapping and other handling to be done as some PLCs return raw string data in various forms and orders.

Challenges:

  1. PLCs that return only the valid string bytes may not have a simple way to determine the maximum string size. At least with AB PLCs, strings are 82 characters except on Micro800 where they appear to be 255 characters. There are also packet size limitations that may impact this.
  2. How to handle arrays of strings? PLCs that allow writes to arrays of strings may require significant changes to the representation in memory before writing the values to the PLC. PLC/5 and Control/CompactLogix appear to transmit the full 82 characters in all cases.
  3. This is a core library feature that would interpret values that were not scalar values or arrays. This breaks the existing pattern of the library performing just the core functionality and it thus arguably not appropriate for the core library.
@kyle-github
Copy link
Member Author

How would the offset be used in the case of string arrays? Set elem_size to let the application know how big a fixed size string is?

While low friction for apps, this may be too high level for the core library.

@kyle-github
Copy link
Member Author

Another possible variant of this would be to rewire the existing getters and setters to handle the byte swapping etc. automatically. This gets challenging when a string is a field within a UDT tag. That alone argues for new API calls.

@kyle-github
Copy link
Member Author

Perhaps a different, slightly lower level set of API calls:

int plc_tag_get_string_length(int32_t tag_id, int string_start_offset);
int plc_tag_set_string_length(int32_t tag_id, int string_start_offset, int string_len);
int plc_tag_get_string_char(int32_t tag_id, int string_start_offset, int char_index);
int plc_tag_set_string_char(int32_t tag_id, int string_start_offset, int char_index, int char_val);

This is a little clunky, but does not require any additional memory allocation or in place data transformation. The tag buffer would still contain unchanged raw PLC data. This would allow configuration of the size of the count element, if there was one so that it can be translated in get/set string size. It would allow handling for byte-swapping of the string character indexes.

It abstracts the underlying byte representation sufficiently that generic code can be written to work with any PLC type at the application level.

@timyhac
Copy link

timyhac commented Jul 22, 2020

Yep looks good, are UTF-16 or 32 ever used?

@kyle-github
Copy link
Member Author

There are string types that support wide characters, which might be some form of UTF-16, but I have not seen them in practice. I think we can continue to support them using the same API. UTF-32 seems to be non-existent. I have never seen it used.

We can relax the restrictions on plc_tag_get_string_char() to return 0-X where X is 255 or 65535 depending on the char size. Negatives are still error return codes.

@kyle-github
Copy link
Member Author

There may be some need to determine the maximum capacity of a string in characters. For instance PLC/5 through ControlLogix support strings up to 82 characters. However, I think that Micro800 PLCs may support up to 255. This is very much PLC specific. We also need a way to skip past a string.

So at least two more API functions:

int plc_tag_get_string_capacity(int32_t tag_id, int string_start_offset);
int plc_tag_get_string_total_length(int32_t tag_id, int string_start_offset);

Again, there are some caveats. The string values in the raw data from a tag listing do NOT conform to a fixed size. I really want to find a solution that will work for those as well. If I can figure that out, then even tag listing is almost completely supported by the library rather than partially as now.

This needs more thought. There is a bit of an API explosion due to strings and I am not sure that this makes sense without also thinking about UDTs in general.

@timyhac
Copy link

timyhac commented Jul 22, 2020

@kyle-github - one option here we could use is to scope out the feature using a wrapper, and then as it becomes clear what to do, swap out the functionality into core project.

I'm not super familiar with it but web browsers do this with javascript polyfills.

What do you think?

@timyhac
Copy link

timyhac commented Jul 22, 2020

We could even release it as a separate package so it doesn't cloud up the "real" wrapper.

@kyle-github
Copy link
Member Author

@timyhac I tend to agree with the idea of "prototyping" it in a wrapper library first.

Right now I am going down a bit of a rabbit hole. I had some thoughts based on the string API above, and realized that I was starting toward a useful extension that would support a lot of UDT use. I am still thinking through it. Hopefully a bit later today I will have something to show and then I can see if it is worth it or not.

@kyle-github
Copy link
Member Author

I am dropping the UDT ideas. They'll work and will be feasible but they introduce two completely different APIs. They should be done as either a different library based on the core library or as a parallel API. And the cost of supporting generic UDTs is quite high.

I am leaning toward string support in the core library. I do like the idea of prototyping it either in a wrapper or in higher-level C extension library. My reasoning is this:

Pro:

  • Strings are fundamental types in most PLCs in some manner or another.
  • While they are structured, they tend to be in a relatively small number of formats. A half-dozen with some minor variations will probably cover more than 95% of all cases.
  • There are some fiddly bits such as the endian-ness of the count word, the size of the count word, byte-swapped character data, padding bytes etc. that everyone will have to solve. That argues for core support.
  • If not in the core then every wrapper will need to implement support for strings.

Con:

  • Strings can be set up as UDTs but with non-standard lengths thus possibly breaking built-in string support.
  • Because of the need to keep memory allocation and ownership cleanly separated between the library and applications, the simple solution of taking and returning C strings is not feasible. This means that the API will not be very ergonomic compared to solving it in each wrapper.

Overall I think the argument that every wrapper will need to implement something is a strong one. While the API I have above will not be the easiest to use directly, it will make wrapper library implementations of easy string use much nicer and hopefully not PLC specific. But it is not as clear cut as the difference between UDTs and primitive types.

I think the API still needs some thought.

@kyle-github
Copy link
Member Author

The frustrating thing here is that the original API idea using C strings is so much easier to use. The problem with it is memory handling. Writing code using the in-place character-oriented API functions is not much different from the existing API. There is less knowledge required of the PLC data representation, which is the whole point.

@kyle-github
Copy link
Member Author

I think I am overthinking this to some extent. There are only a few different attributes of strings that are common:

  • is the string a fixed capacity string? If so, what is the capacity?
  • is the string a counted string or a sentinel terminated string?
  • if it is counted, how big is the count word?
  • how many padding characters/alignment?

The key thing that makes this simple is that all this only needs to apply per tag. I do not have to solve this for all tags at once. Each tag has a specific string representation. There could be UDTs with multiple string types included, but I am perfectly fine handing that right back to the end application creator as their problem to solve. Just handling this with one string representation per tag solves the 95% case easily.

This also lets me solve the tag listing raw data problem I wanted to solve. For instance string_format=c16. That could mean that the string is a counted string using a 16-bit length field, and defaults to variable length with no padding. That would match the raw tag listing data. Another example: string_format=l82,c16,p2. That would be:

  • fixed capacity of 82 bytes of data.
  • a 16-bit count.
  • 2 bytes of padding at the end.

That would describe a PCCC string. The format string_format=l82,c32,p2 would describe a *Logix STRING type. Micro800 might be string_format=c8 but that is not clear yet.

Suitable defaults for the string format would work for most cases for all tags in a PLC.

This makes adding six more API functions more palatable.

@timyhac
Copy link

timyhac commented Jul 24, 2020

To me the big challenge is to support variable-size tags. Supporting strings themselves aren't a major issue (even if they come in various formats) - it is that we don't know what the elementSize is (on some CPUs), this issue is compounded for arrays or structures.

Is my understanding correct here?

@kyle-github
Copy link
Member Author

Yes, that is one of the driving problems. At least with Control/CompactLogix systems, the elem_size is something we can skip when defining the tag. The PLC will let us know how big the tag is simply by returning all the elements we ask for.

The driving problem for me on this is handling the raw data from tag listing. That has variable length strings that do not have a constant capacity allocated. The string is exactly the length specified without any extra capacity.

I think I can make the string support functions I have above (the six character-based ones) work with both constant capacity strings (aka STRING data type) and compact variable length counted strings as we see in the raw tag listing data. But the latter is where I need to have the plc_tag_get_string_total_length() function. Without that, you cannot skip to the next tag entry.

I do not want to get sucked back into the UDT rabbit hole, so I want to keep this a simple as possible. I think just supporting one string format per tag is going to solve most of the cases. It works for the tag listing raw data. It works for arrays of STRING. It works for many UDTs where either there is a single string type or the user set up the UDT using the system STRING type. It won't work for UDTs where there is more than one string type in use at the same time.

For the time being, we can pre-define some common string types so that the string_format attribute can be simple:

  • string_format=AB_LGX_STRING - Set up the string type with the necessary defaults for a string with a 4-byte count word, non-byte-swapped byte-length character data and 82 byte capacity with 2 bytes of padding.
  • string_format=AB_PCCC_STRING - Set up the string type with defaults for a 2-byte count word, byte swapped character data, 82-byte capacity, and no padding.
  • string_format=VAR_STRING_C2 - Set up a string type with variable length and a 2-byte count word, no fixed capacity and no padding.

Just those three would solve almost all string handling for AB PLCs. I came up with another syntax for this for my UDT ideas and could borrow it here. These three would be:

  • string_format=u8[u32:82]+2 for the Logix-class strings.
  • string_format=u8[u16:82:1,-1] for the PLC/5-class strings. The 1,-1 part is the byte order and would be repeated as needed.
  • string_format=u8[u16] for the raw tag listing data.

At least initially, I think the named string types will work fine as I determine the best way to describe them internally.

@kyle-github
Copy link
Member Author

This has been implemented and tested as best I can in the add_string_api branch. I will keep that branch up to date with regards to the prerelease branch until the API is ready to be used.

@kyle-github
Copy link
Member Author

And this went nowhere...

It is just not simple enough. I have determined how to delete memory allocated in the native library. So I can implement something like the original API. The new API would look like this:

char *plc_tag_get_string(int32_t tag_id, int string_start_offset);
int plc_tag_set_string(int32_t tag_id, int string_start_offset, const char *string_val);
int plc_tag_get_string_capacity(int32_t tag_id, int string_start_offset);
int plc_tag_get_string_total_length(int32_t tag_id, int string_start_offset);

Any language wrapper would be responsible for freeing the memory allocated in the core library in the first function above. Nearly every language has some form of FFI that will handle C strings, so we do not need functions to get or set the string length. That can be done implicitly via the length of the C string.

We need the third and fourth functions as those can vary even within a single PLC. For example, when listing tags in a ControlLogix, the symbol names are counted strings using a 16-bit count and variable length string data bytes. This is very different from a STRING type in the same PLC!

@kyle-github kyle-github reopened this Oct 31, 2020
@kyle-github
Copy link
Member Author

So that is not going to work either. C#/.Net makes this hard as you cannot easily call free() to free up malloc()ed memory.

Perhaps put all the memory handling in the wrapper?

int plc_tag_get_string(int32_t tag_id, int string_start_offset, char *buffer, int buffer_size);
int plc_tag_set_string(int32_t tag_id, int string_start_offset, const char *string_val);
int plc_tag_get_string_capacity(int32_t tag_id, int string_start_offset);
int plc_tag_get_string_total_length(int32_t tag_id, int string_start_offset);

First you call plc_tag_get_string_capacity() to get the maximum capacity. Then you create a byte buffer sufficiently large to handle that. Then you call plc_tag_get_string() with that buffer. You get back a value that is the length of the string, or a negative number that is an error (i.e. PLCTAG_ERR_TOO_SMALL if there is insufficient space).

Another option is to expose a free function:

char *plc_tag_get_string(int32_t tag_id, int string_start_offset);
int plc_tag_free_string_buffer(char *buffer);
int plc_tag_set_string(int32_t tag_id, int string_start_offset, const char *string_val);
int plc_tag_get_string_capacity(int32_t tag_id, int string_start_offset);
int plc_tag_get_string_total_length(int32_t tag_id, int string_start_offset);

That allow you to free without knowing how long the string is in advance.

@kyle-github
Copy link
Member Author

As surfaced in the .Net wrapper discussion, setting one byte at a time to zero out the remaining bytes of a string might be fairly slow/heavy on platforms with unoptimized mutexes. So add a function to the API to set a range of bytes just like memset(). That makes the final string API this:

/* base string API */
int plc_tag_get_string(int32_t tag_id, int string_start_offset, char *buffer);
int plc_tag_set_string(int32_t tag_id, int string_start_offset, const char *string_val);
int plc_tag_get_string_capacity(int32_t tag_id, int string_start_offset);
int plc_tag_get_string_total_length(int32_t tag_id, int string_start_offset);

/* byte range/raw byte API */
int plc_tag_clear_bytes(int32_t tag_id, int start_offset, int range_length, uint8 val);
int plc_tag_get_bytes(int32_t tag_id, int start_offset, int range_length, uint8_t *destination_buffer);
int plc_tag_set_bytes(int32_t tag_id, int start_offset, int range_length, uint8_t *source_buffer);
  • plc_tag_get_string(int32_t tag_id, int string_start_offset, char *buffer) copies the string characters into buffer for the string starting at string_start_offset. For the purposes of the copy, if the string type of the tag contains a count word, that word is skipped and only the data part of the string is copied. The characters are ordered in regular string ordering for the platform rather than PLC ordering (which can be byteswapped on some PLCs). It is the responsibility of the calling wrapper/application to allocate a buffer sufficiently long to contain the string and the NUL string terminator, and to dispose of the buffer when the application is complete. This function returns:

    • PLCTAG_STATUS_OK - returned on success.
    • PLCTAG_ERR_NULL_POINTER - returned when the passed buffer is null.
  • int plc_tag_set_string(int32_t tag_id, int string_start_offset, const char *string_val) copies the passed string into the string data structure in the tag buffer. The function will clear all possible string memory to the zero byte and set the string length correctly if the string is a counted string. This function returns:

    • PLCTAG_STATUS_OK - returned on success.
    • PLCTAG_ERR_TOO_LARGE - returned when the amount of data in the string will not fit in the string type in the tag buffer.
    • PLCTAG_ERR_NULL_POINTER - returned when the passed string is null.

To be continued...

@kyle-github
Copy link
Member Author

I'll probably do this in two stages. The first will implement just the string part of the API. If there is any actual need then I will implement the second part with the raw byte handling. Once we have strings, we will be able to directly handle almost all primitive types so the overhead of the mutex will not be too terrible.

@kyle-github
Copy link
Member Author

There should be a change to the getter. It makes sense to have the getter take a parameter for the length of the buffer from the application. One open question is whether the getter should zero-terminate the string or not. Since the intent is to provide a C string, then it seems like the getter should. But at the same time, the string length is known to the application code so it can zero terminate. The UX for having the getter terminate is better so I am leaning that way.

Here is what the getter would look like with the buffer length:

int plc_tag_get_string(int32_t tag_id, int string_start_offset, char *buffer, int buffer_length);

This allows the application/wrapper to help ensure that the buffer is not overrun. The guarantee by the library will be that no more than buffer_length bytes are copied. This is where the zero termination comes in. Wrappers will need to know about C string formats. That is not particularly onerous, however.

@kyle-github
Copy link
Member Author

kyle-github commented Nov 26, 2020

In progress. Should be done soon. Might need to add an additional API function to get the string length.

@timyhac
Copy link

timyhac commented Jan 31, 2021

I think this issue can be closed, right?

@kyle-github
Copy link
Member Author

D'oh, yes. Yes, indeed. Thanks for the catch!

Closed with PR #227.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants