Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] return object storage by value #9

Open
jedwards1211 opened this issue Sep 29, 2020 · 21 comments
Open

[Feature request] return object storage by value #9

jedwards1211 opened this issue Sep 29, 2020 · 21 comments
Assignees
Labels
enhancement New feature or request

Comments

@jedwards1211
Copy link
Contributor

jedwards1211 commented Sep 29, 2020

After what I described in #8, I thought maybe it would be possible to return an object storage:

  public static Length() Create(double value, LengthUnit unit) {
    Length() length;
    length.Set(value, unit);
    return length;
  }

But it causes this compiler error, which seems contradictory:

src/length.ci(23): ERROR: Cannot coerce Length() to Length()

However cito supports assigning one object storage to another:

    Length() length;
    Length() length2;
    length = length2; // no errors, output C++ is okay

Would it be possible for cito to support this by returning by value in C++?

Length Length::create(double value, LengthUnit unit)
{
	Length length;
	length.set(value, unit);
	return length;
}

And return a reference in other languages like JS?

create(value, unit) {
  const length = new Length();
  length.set(value, unit);
  return length;
}

I could make the C++ API much more convenient if I could have these static methods that return by value instead of returning shared pointers, or constructing and then initializing.

@jedwards1211 jedwards1211 changed the title return storage by value? [Feature request] return and assign object storage by value Sep 29, 2020
@jedwards1211 jedwards1211 changed the title [Feature request] return and assign object storage by value [Feature request] return object storage by value Sep 29, 2020
@pfusik
Copy link
Collaborator

pfusik commented Sep 29, 2020

Returning an object storage, in general, implies making a copy of the object. Java doesn't support that.

Factory methods should make a heap allocation:

public static Length# Create(double value, LengthUnit unit) {
    Length# length = new Length();
    length.Set(value, unit);
    return length;
}

Assigning object storages is not legal - I need to add an error message.

@pfusik
Copy link
Collaborator

pfusik commented Sep 29, 2020

If I understand correctly, you are asking for moving objects:

static Length() Create()
{
    Length() localObj;
    return localObj; // OK because localObj is about to be destroyed
}

Length() objStorage = Length.Create(); // OK
objStorage = Length.Create(); // should it be legal?

This seems possible.

Advantages:

  1. Nicer C++ API.

Drawbacks:

  1. Makes Ć more complicated.
  2. In C, the struct must be exposed in the header file. Currently the structures are opaque.

@jedwards1211
Copy link
Contributor Author

Right that's what I'm asking for...
That seems to be beyond my knowledge of how C works, in any case do you want the structs to be opaque to discourage people from messing with the internal data?

@jedwards1211
Copy link
Contributor Author

jedwards1211 commented Sep 29, 2020

Also in Java doing a shallow copy is pretty trivial at least, the object just needs to implement Cloneable and override Object.clone to make it public and call the super implementation. I understand that's not the same as a deep copy of a stack-allocated class in C++ though. I adding code to also clone object fields wouldn't be super complicated, though it would definitely add complexity to Ć.

@jedwards1211
Copy link
Contributor Author

jedwards1211 commented Sep 29, 2020

If the goal is to make every language output as idiomatic as possible though, it would be nice to have some way of writing an immutable object pattern that's idiomatic in every output language. Here's the C++ header I wrote for this object when I was experimenting with using SWIG to use the C++ code from other langs...an equivalent immutable class in Java/JS/Python etc can be passed around by reference without any problems, no need to copy it.

I guess if there were a specific way to declare a Ć class immutable, you could avoid the need for copying in Java etc.

#ifndef UNITIZED_LENGTH_H
#define UNITIZED_LENGTH_H

#include "angle.h"

namespace unitized {

class Length
{
public:
    enum Unit {
        Meters = 1,
        Centimeters = 2,
        Kilometers = 3,
        Feet = 4,
        Yards = 5,
        Inches = 6,
        Miles = 7
    };

    Length(double value, Unit unit);

    static Length meters(double value);
    static Length centimeters(double value);
    static Length kilometers(double value);
    static Length feet(double value);
    static Length yards(double value);
    static Length inches(double value);
    static Length miles(double value);

    static Angle atan2(Length y, Length x);

    double convertTo(Unit unit) const;
    double toMeters() const;
    double toCentimeters() const;
    double toKilometers() const;
    double toFeet() const;
    double toYards() const;
    double toInches() const;
    double toMiles() const;

    Length as(Unit unit) const;
    Length asMeters() const;
    Length asCentimeters() const;
    Length asKilometers() const;
    Length asFeet() const;
    Length asYards() const;
    Length asInches() const;
    Length asMiles() const;

    Length add(Length addend) const;
    Length sub(Length subtrahend) const;
    Length mul(double multiplicand) const;
    Length div(double denominator) const;
    double divUnitless(Length denominator) const;
    Length mod(Length modulus) const;
    Length abs() const;
    Length negate() const;
    bool isFinite() const;
    bool isInfinite() const;
    bool isNaN() const;
    bool isNegative() const;
    bool isPositive() const;
    bool isZero() const;
    bool isNonzero() const;
    bool equals(Length other) const;
    int compareTo(Length other) const;

    const Unit unit;

private:
    const double value;

    static double convert(double value, Unit from, Unit to);
    static double fromBase(double value, Unit to);
    static double toBase(double value, Unit from);
};

} // namespace unitized

#endif // UNITIZED_LENGTH_H

@pfusik
Copy link
Collaborator

pfusik commented Sep 29, 2020

do you want the structs to be opaque to discourage people from messing with the internal data?

Explicit structures become part of the library interface.
This is especially important if the library is dynamically linked (dll/so).
Once you pass struct { int x; int y; } by value between your library and its users, you cannot add or remove its fields in new versions of your library.

@jedwards1211
Copy link
Contributor Author

I see. Is that not the case for adding/removing fields to C++ classes?

@vpicaver
Copy link

vpicaver commented Oct 1, 2020

To get around the adding / removing fields for a DLL, you have to make an internal data pointer inside of the class.

Here's a good example of the DLL memory alignment problem and the solution in Qt library for c++:
https://wiki.qt.io/D-Pointer

The data is still allocated on the heap, but the accessing class can be passed by value. The underlying D-Pointer is deep copied. Or if you're fancy, you can copy on write like https://doc.qt.io/qt-5/qshareddatapointer.html

@jedwards1211
Copy link
Contributor Author

I guess there's no way around the problem if you want pure stack allocation?

@vpicaver
Copy link

vpicaver commented Oct 1, 2020

Yea, not really. It's only really useful for types that you know won't ever change.

If you change the type, the dependent applications need to rebuild.

@pfusik
Copy link
Collaborator

pfusik commented Oct 1, 2020

Yes, there's inconsistency between the generated C and C++ code:

  • C uses opaque pointers and only allows the users to allocate on the heap. That felt the right way, promoting version compatibility for dynamic libraries. Also, it wasn't hard to implement because the functions accept pointers anyway.
  • C++ exposes all the classes. This was the straightforward approach. I don't have much experience with C++ libraries other than Qt, which I'm aware uses the Pimpl pattern as described by vpicaver. And despite that, Qt has had breaking changes.

What do you think? Should cito emit Pimpl for C++ too?

@jedwards1211
Copy link
Contributor Author

jedwards1211 commented Oct 1, 2020

I do like how clean the syntax is with a lot of Qt classes since they use Pimpl; it's nice not to have to declare variables as shared pointer types. Though for this particular use case I'm going for, heap allocation feels like a waste (and I wouldn't expect to change the fields of this particular class so I wouldn't have an issue with DLLs either). But I know there are a lot of concerns to balance here...

@jedwards1211 jedwards1211 mentioned this issue Oct 2, 2020
@jedwards1211
Copy link
Contributor Author

Is there not any kind of compiler/linker flag to automatically generate Pimpl for classes that will be exported in a DLL? Seems like something that doesn't need to be done in userland...

@jedwards1211
Copy link
Contributor Author

That or, I'm surprised dynamic linking wasn't designed to read the sizes of exported classes from the DLL. Why hardcode the sizes of the version you compiled against, is it for performance's sake?

@pfusik
Copy link
Collaborator

pfusik commented Dec 14, 2020

Is there not any kind of compiler/linker flag to automatically generate Pimpl for classes that will be exported in a DLL?

No. I only found:

That or, I'm surprised dynamic linking wasn't designed to read the sizes of exported classes from the DLL.

sizeof(YourClass) is a compile-time constant in C++. Cannot import it from a DLL therefore.

My experience with dynamic linking C++ code is that all the entry points are extern "C" functions.
Classes are allocated with a factory method and you interface with them through an abstract class interface.
E.g.

class Foo
{
  public:
    virtual ~Foo() = 0;
    virtual void doBar() = 0;
};

extern "C" Foo *allocateFoo(void);

I think nothing stops cito from emitting such an interface. To mantain backwards compatibility, methods must not be removed, have their signature changed (these two apply to all other targets), but also the public methods cannot be reordered, which might be surprising to developers coming from other languages.

@pfusik
Copy link
Collaborator

pfusik commented Mar 12, 2021

Back to the original problem. The argument of best performance in C and C++ is important. I think I can tolerate the two drawbacks mentioned above.

Consider:

class Vector3D
{
    float X;
    float Y;
    float Z;

    // named constructor
    static Vector3D() Create(float x, float y, float z)
    {
        Vector3D() result;
        result.X = x;
        result.Y = y;
        result.Z = z;
        return result;
    }

    Vector3D() Add(Vector3D other)
    {
        return Create(X + other.X, Y + other.Y, Z + other.Z);
    }

    static void Demo()
    {
        Vector3D() a = Create(1, 2, 3);
        Vector3D() b = Create(5, 10, 15);
        Vector3D() c = a.Add(b);
        // c = a; inconsistent if references to c existed before
        Vector3D() d = b; // valid if b won't be used afterwards
    }
}

Care must be taken so that semantics stays same in all backends. I wouldn't like objects copied in some languages but only references copied in the other.
We can adopt move semantics from C++.
In the Create method C++ applies Return Value Optimization (RVO) so that there is no copy. This is because the return value is a local storage that ceases to exist. Hence no copy.
The Add method returns the result of Create. No copy needed.
Finally, it is safe to initialize storage with the result of method that returns storage. No copy either.

Generally, if the source is storage that is not used afterwards, it is moved, so should be accepted as an initializer or a return value.

Now for the price paid, aka disadvantages:

  1. This doesn't look like a complex extension of the language. There is even no extra syntax needed. I'm happy we can avoid C++ r-value references, move constructors, forwarding references, perfect forwarding etc. because we have no copy constructors in the first place to maintain compatibility with.

  2. Exposing C structures in the header:
    a. We can restrict the feature to non-public methods.
    b. We can emit the structures that are returned by value. Structures such as Vector3D aren't expected to evolve.

@pfusik
Copy link
Collaborator

pfusik commented Mar 12, 2021

Users messing with fields directly isn't a problem for Vector3D either. This can even be viewed as a feature.

@pfusik pfusik self-assigned this Mar 12, 2021
pfusik added a commit that referenced this issue Mar 31, 2021
@pfusik
Copy link
Collaborator

pfusik commented Mar 31, 2021

Added initial implementation. It has no diagnostics to prevent you from accidentally making a copy, so use with care!

There are rough edges for the C target:

  • invalid C emitted if a returned object is passed as a pointer argument (gcc says error: lvalue required as unary '&' operand)
  • structures not exported to the public header if needed

@pfusik
Copy link
Collaborator

pfusik commented Sep 13, 2021

invalid C emitted if a returned object is passed as a pointer argument

Fixed last week. See https://github.com/pfusik/ray-ci for a small sample.

It exposed a new problem: if you pass objects by value, you might want to eventually assign them to a field. Which is troublesome in Ć, because it's not obvious how to implement object assignments in most of the languages. C and C++ support it out of the box. The others don't. I see two options:

  1. Assign objects by assigning field-by-field.
    a. Using a cito-generated "assign" method.
    b. By inlining the field assignment (ugly).

  2. Assign references and cope with it: object storage references are no longer read-only. This requires a special syntax and the programmer must be aware that references get invalidated by the assignment.

For ray-ci I went the easy path "2". The special syntax is: Class() Field = null, but it's not really a null reference, but rather an undefined initial value. Other options considered:

  • Class()! Field (similar to read-write references, however here it would mean something different: you could reassign objects. Object storage fields could be reassigned both with and without the exclamation mark.
  • Class()? Field (think "optional"). Perhaps a good idea, especially if references are extended with explicit nullability (C#-style).

Either way, reassignable object storage is emitted in C# and Java without the initial value or the readonly/final modifier.

This is still work-in-progress (e.g. the spec is not updated yet).
Comments are welcome!

@jedwards1211
Copy link
Contributor Author

jedwards1211 commented Sep 13, 2021

Ah cool!

I do hope to eventually get back to working on Cito stuff 😅

I guess I was assuming a return by value in Cito could be converted to a return by value in C/C++, but converted to a return by reference in other languages. But maybe that would cause problems?

In any case this relates to how I wish there were an immutable class type in Cito. An immutable class could be passed around by value in C/C++ and by reference in other languages without risks (assuming the dev abides by the immutable paradigm in their own code in the consuming language)

@pfusik
Copy link
Collaborator

pfusik commented Sep 18, 2021

I guess I was assuming a return by value in Cito could be converted to a return by value in C/C++, but converted to a return by reference in other languages.

This is how I implemented it.

In any case this relates to how I wish there were an immutable class type in Cito. An immutable class could be passed around by value in C/C++ and by reference in other languages without risks (assuming the dev abides by the immutable paradigm in their own code in the consuming language)

Immutable class type, interesting! That should work. So far I thought about C++-like move semantics.

@pfusik pfusik added the enhancement New feature or request label Aug 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants