Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Adopting a C++ style guide #22710

Open
seiko2plus opened this issue Dec 2, 2022 · 14 comments
Open

ENH: Adopting a C++ style guide #22710

seiko2plus opened this issue Dec 2, 2022 · 14 comments
Labels
C++ Related to introduction and use of C++ in the NumPy code base

Comments

@seiko2plus
Copy link
Member

seiko2plus commented Dec 2, 2022

Adopting a C++ style guide:

The current approach to using C++ is somewhat unclear. While trying to review #22315, I faced several obstacles, and I couldn't determine which style/rules I should follow; C++ is a monster of features compared to the C language. We urgently need to impose a specific code style to avoid chaos.

I suggest adopting the latest Google C++ Style guide, with minor modifications limited to the following:

  • C++ Version: code should target C++14 , or maybe keep it as-is C++17. Would it be possible? However, keep targeting C++11 in 2023 is too rigorous.

  • Spaces vs. Tabs: indent four spaces at a time.

  • Function Declarations and Definitions: similar to our C style, except the return type goes to the same line next to the function name.

@seiko2plus seiko2plus added the C++ Related to introduction and use of C++ in the NumPy code base label Dec 2, 2022
@charris
Copy link
Member

charris commented Dec 2, 2022

Thanks for starting on this.

  • Version choice: I'll leave to you and the SciPy folks.
  • Indentation: four space sounds good
  • Function Definitions: Both choices have advantages. I like the same line version for short class methods, but prefer two lines for stand alone functions. C++ tends to long lines, so shortening helps. But either would be acceptable. I think same line is more generally popular.

Long term, I would like the style to be defined by the .clang-format file, which is currently based on the google style with modifications motivated by our C style guide. The tricky (ugly) part of that file is the handling of header files, which need to be included in specific order because our header files are such a interdependent mess. Rationalizing those files would be nice, but we might need to keep two versions just for downstream compatibility. Note that clang-format doesn't have a way to specify spacing between function definitions, but perhaps that has changed.

I think we will need to adjust line length for C++. I like the 80 character limit, but it is a bit cramped for C++. The Black default of 88 might be an option.

@mattip
Copy link
Member

mattip commented Dec 2, 2022

except the return type goes to the same line next to the function name

I am -1 on this. It is very convenient to be able to use grep on the codebase, and to depend on ^funcname to be able to find a function definition that starts at the beginning of a line. I would also be in favor of using that convention inside namespaces, forgoing the traditional indentation inside a namespace.

@seiko2plus
Copy link
Member Author

seiko2plus commented Dec 3, 2022

Version choice: I'll leave to you and the SciPy folks.

There may be objections to C++17, but C++14 is widely supported; see compiler_support and its already part of the meson build:

numpy/meson.build

Lines 10 to 17 in 4c4e803

default_options: [
'buildtype=debugoptimized',
'c_std=c99',
'cpp_std=c++14',
'blas=openblas',
'lapack=openblas',
'pkgconfig.relocatable=true',
],

Rationalizing those files would be nice, but we might need to keep two versions just for downstream compatibility.

Agree, also to qualify the headers to support pch.

I think we will need to adjust line length for C++. I like the 80 character limit, but it is a bit cramped for C++. The Black default of 88 might be an option.

An 80-character limit with few exceptions similar to Google's guide sounds good to me.

I am -1 on this. It is very convenient to be able to use grep on the codebase, and to depend on ^funcname to be able to find a function definition that starts at the beginning of a line.

Can we handle it via cpplint? Wouldn't make the contributors get confused even if we address it within our style document? Personally, I rely on ctags to jump on functions definitions, declarations. maybe grep "funcname(" or awk match.

EDIT: meson

@eli-schwartz
Copy link

C++14 is basically a bugfix edition of C++11 and seems like a pretty safe choice 9 years later.

@rgommers
Copy link
Member

rgommers commented Dec 5, 2022

C++14 is fine as a base. I bumped the default to c++14 in the Meson build, because it looked like MSVC issued a warning about C++11 and supports C++14 better. Even C++17 should be okay, modulo the not-yet-universally-supported versions.

In general, let's follow http://scipy.github.io/devdocs/dev/toolchain.html#c-language-standards, where a lot of research about portability has already been documented. We can just add to that - the requirements for NumPy are pretty much identical to those for SciPy.

@seiko2plus
Copy link
Member Author

In general, let's follow http://scipy.github.io/devdocs/dev/toolchain.html#c-language-standards,

Great!, C++17 (core language + universally available stdlib features) is more than enough, Any objections?
C++17 will improve the readability of our future C++ code. Who would like to use C++ nowadays without a static if!?

I guess now the next step is to implement a CI test. We can count on both clang-tidy and clang-format. Google's cpplint doesn't cover everything we need. clang-tidy is an amazing tool for static analysis, and it's now possible to use it after brining meson.

@mattip
Copy link
Member

mattip commented Dec 6, 2022

The link @rgommers pointed to is about toolchains, not language features. I would prefer we go very slowly with adopting advanced C++ features. I consider our c.src templating a feature, not a bug: It is easy to see how the code is processed. Modern C++ hides too much behind the preprocessor and makes debugging very difficult: there is no telling what the actual code used looks like.

@rgommers
Copy link
Member

rgommers commented Dec 6, 2022

I consider our c.src templating a feature, not a bug: It is easy to see how the code is processed.

Not that C++ templates are very nice, but I don't think I agree that one-off hand rolled templating that no developer tool or newcomer to our code base understands is a "feature". It's a workaround for the limitations of C.

I would prefer we go very slowly with adopting advanced C++ features.

It'd be great to have a list of C++ features that we'd want to use, and that we can add to over time. As well as things in C (e.g., only recently we started using inline and restrict keywords, and I've wondered for a while if we could use _Generic to avoid some templating). Use of constexpr if and init statements for if and switch statements may be examples of where C++17 helps - they actually make the code cleaner and easier to understand.

I agree with the "not too much hard-to-grasp magic" sentiment - but I also think that there's a lot to gain.

@seiko2plus
Copy link
Member Author

seiko2plus commented Dec 7, 2022

@mattip,

I would prefer we go very slowly with adopting advanced C++ features

That's maybe why the Google style guide may fit NumPy. It simply removes the extra sign + and makes it sounds like C+.

It is easy to see how the code is processed. Modern C++ hides too much behind the preprocessor and makes debugging very difficult: there is no telling what the actual code used looks like.

Modern C++ STL but not the core language. Google's guide against C++ exceptions which means hasta la vista STL.

@rgommers,

It'd be great to have a list of C++ features that we'd want to use,

I think it can be done via clang-tidy

I've wondered for a while if we could use _Generic to avoid some templating)

Rather than bringing a c-11 feature, we can keep it simple by having a single traits np::Traits class covering all the constants and counting on function overloading as far as we can. It will require us to classify our array data types, e.g. Half, TimeDelta, DateTime etc.

Another thing, we should only count on fixed-width integers rather than on platform-specific types; see the following code (part of my local work) defining only 22 enumerated values rather than 24:

Enumerated types
#include "npstd.hpp"

namespace np {
/// @addtogroup cpp_core_types
/// @{
/**
 * Enumerated types defined providing the basic 22 data types
 */
enum class Types : uint8_t {
    /// The enumeration value for the boolean type, stored as one byte.
    /// It may only be set to the values 0 and 1.
    kBool = NPY_BOOL,
    /// The enumeration value for an unsigned 8-bit integer.
    kUInt8 = NPY_UINT8,
    /// The enumeration value for a signed 8-bit integer.
    kInt8 = NPY_INT8,
    /// The enumeration value for an unsigned 16-bit integer.
    kUInt16 = NPY_UINT16,
    /// The enumeration value for a signed 16-bit integer.
    kInt16 = NPY_INT16,
    /// The enumeration value for an unsigned 32-bit integer.
    kUInt32 = NPY_UINT32,
    /// The enumeration value for a signed 32-bit integer.
    kInt32 = NPY_INT32,
    /// The enumeration value for an unsigned 64-bit integer.
    kUInt64 = NPY_UINT64,
    /// The enumeration value for a signed 64-bit integer.
    kInt64 = NPY_INT64,
    /// The enumeration value for a 16-bit floating point (IEEE 754-2008 compatible).
    kHalf = NPY_HALF,
    /// The enumeration value for a 32-bit floating point (IEEE 754-2008 compatible).
    kFloat = NPY_FLOAT,
    /// The enumeration value for a 64-bit floating point (IEEE 754-2008 compatible).
    kDouble = NPY_DOUBLE,
    /// The enumeration value for a platform-specific floating point type which is at
    /// least as large as `Types::kFloat64`, but larger on many platforms.
    kLongDouble = NPY_LONGDOUBLE,
    /// The enumeration value for a 64-bit complex type made up of two `Types::kFloat` values.
    kCFloat = NPY_CFLOAT,
    /// The enumeration value for a 128-bit complex type made up of two `Types::kDouble` values.
    kCDouble = NPY_CDOUBLE,
    /// The enumeration value for a platform-specific complex floating point type
    /// which is made up of two `Types::kLongDouble` values.
    kCLongDouble = NPY_CLONGDOUBLE,
    /// The enumeration value for references to arbitrary Python objects.
    kObject = NPY_OBJECT,
    /// The enumeration value for ASCII strings of a selectable size.
    /// The strings have a fixed maximum size within a given array.
    kString = NPY_STRING,
    /// The enumeration value for UCS4 strings of a selectable size.
    /// The strings have a fixed maximum size within a given array.
    kUnicode = NPY_UNICODE,
    /// Primarily used to hold struct dtypes, but can contain arbitrary binary data.
    kVoid = NPY_VOID,
    /// The enumeration value for a data type which holds dates or datetimes
    /// with a precision based on selectable date or time units.
    kDateTime = NPY_DATETIME,
    /// The enumeration value for a data type which holds lengths of times in
    /// integers of selectable date or time units.
    kTimeDelta = NPY_TIMEDELTA
};
/// ... rest of the code
} // namespace np

@rgommers
Copy link
Member

rgommers commented Dec 7, 2022

I think it can be done via clang-tidy

That sounds good to me.

Another thing, we should only count on fixed-width integers rather than on platform-specific types; see the following code (part of my local work) defining only 22 enumerated values rather than 24:

This is probably best split off into a separate issue/topic. It is of interest to reduce the number of dtypes we're using. I will note that for array indexing, using 32-bit integers on 32-bit Python and 64-bit integers on 64-bit Python seems needed (better for performance, and a major break to change it).

@seiko2plus
Copy link
Member Author

seiko2plus commented Dec 7, 2022

This is probably best split off into a separate issue/topic.

It should be part of C++ guide; see Google's Integer_Types, It also may cause specialization parameter duplication. Plus, It doesn't make sense to provide separate operations for long datatype; taking it off will increase readability and save binary size.

using 32-bit integers on 32-bit Python and 64-bit integers on 64-bit Python seems needed (better for performance, and a major break to change it).

It can be solved by providing an alias to map between int32_t, and int64_t or just using intptr_t, rather than that we should no longer provide inner loops for each built-in data type.

@rgommers
Copy link
Member

rgommers commented Dec 7, 2022

It can be solved by providing an alias to map between int32_t, and int64_t or just using intptr_t, rather than that we should no longer provide inner loops for each built-in data type.

Ah yes, that makes sense. As long as we keep it in the list of types that we can't touch without changing the ABI, and we continue providing intp and other dtypes, then indeed there should be no need to have inner loops for them - intp will map to either int32 or int64.

@seiko2plus
Copy link
Member Author

, then indeed there should be no need to have inner loop

And the same goes for long double if it's defined as double.

@rgommers
Copy link
Member

rgommers commented Dec 7, 2022

Discussed in the community meeting as well, with a large part of the team present. Consensus there is that we're good with using C++17 and the Google Style guide. @seiko2plus plans to add one CI job using clang-tidy to verify adherence to that style guide and feature set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ Related to introduction and use of C++ in the NumPy code base
Projects
None yet
Development

No branches or pull requests

5 participants