Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract base class for map types #479

Open
awvwgk opened this issue Jul 30, 2021 · 23 comments
Open

Abstract base class for map types #479

awvwgk opened this issue Jul 30, 2021 · 23 comments
Labels
idea Proposition of an idea and opening an issue to discuss it topic: container (Abstract) data structures and containers

Comments

@awvwgk
Copy link
Member

awvwgk commented Jul 30, 2021

Description
This issue should track a potential addition of a map class to stdlib. I think it would be preferable to first define an abstract base class which covers most of our needs. Here are some points I found important when working on TOML Fortran:

  • data structure owns data
    • memory safety
    • easy deep copies
    • no cyclic references
  • get / find an element based on a key
    • returns pointer to value inside structure
  • add / push back an element for a key
    • value must be allocatable
  • delete / drop an element at a given key
    • should value (optionally) be returned on delete / drop?
  • get a list of all keys / provide an iterator over all keys
  • keys can be strings
    • what about integer, real, ... values?

Examples

Abstract base class used in TOML Fortran to define maps
type, abstract :: map_class
contains
  !> Find a value based on its key
  procedure(find), deferred :: find
  !> Push back a value to the structure
  procedure(push_back), deferred :: push_back
  !> Get list of all keys in the structure
  procedure(get_keys), deferred :: get_keys
  !> Delete a value at a given key
  procedure(delete), deferred :: delete
  !> Destroy the data structure
  procedure(destroy), deferred :: destroy
end type

abstract interface
  !> Find a value based on its key
  subroutine find(self, key, ptr)
    import :: map_class, value_type
    !> Instance of the structure
    class(map_class), intent(inout), target :: self
    !> Key to the value
    character(len=*), intent(in) :: key
    !> Pointer to the stored value at given key
    type(value_type), pointer, intent(out) :: ptr
  end subroutine find
  !> Push back a value to the structure
  subroutine push_back(self, val)
    import :: map_class, value_type
    !> Instance of the structure
    class(map_class), intent(inout), target :: self
    !> Value to be stored
    type(value_type), allocatable, intent(inout) :: val
  end subroutine push_back
  !> Get list of all keys in the structure
  subroutine get_keys(self, list)
    import :: map_class, string_type
    !> Instance of the structure
    class(map_class), intent(inout), target :: self
    !> List of all keys
    type(string_type), allocatable, intent(out) :: list(:)
  end subroutine get_keys
  !> Delete a value at a given key
  subroutine delete(self, key)
     import :: map_class, value_type
     !> Instance of the structure
     class(map_class), intent(inout), target :: self
     !> Key to the value
     character(len=*), intent(in) :: key
  end subroutine delete
  !> Deconstructor for data structure
  subroutine destroy(self)
     import :: map_class
     !> Instance of the structure
     class(map_class), intent(inout), target :: self
  end subroutine destroy
end interface
@awvwgk awvwgk added idea Proposition of an idea and opening an issue to discuss it topic: container (Abstract) data structures and containers labels Jul 30, 2021
@awvwgk awvwgk changed the title Abstract base class for maps types Abstract base class for map types Jul 30, 2021
@milancurcic
Copy link
Member

Is map the same as dictionary (e.g. Python's dict) or do they differ in some ways?

In Python I work with this a lot. In Fortran, not really. But this is one of the features that I think, even if not presently used a lot, would open up many possibilities for development of high level libraries. For example, databases (SQL-style or otherwise), various file formats, parsing HTML, SVG, and other kinds of XML, HTTP clients and servers.

Fortran-native approach is the way to go.

An alternative, perhaps for its own package rather than stdlib, would be Fortran bindings for Redis, which provides high performance in-memory dicts and lists.

@arjenmarkus
Copy link
Member

arjenmarkus commented Aug 26, 2021 via email

@awvwgk
Copy link
Member Author

awvwgk commented Aug 26, 2021

I was thinking C++ ordered map, but Python dictionary is the same thing.

Having an abstract base class would allow to implement the map type with any backend, stdlib would only provide the base class and pure Fortran implementations. Other (fpm) projects could extend the base class and provide their own optimized implementation.

The advantage of a single well-designed abstract base class would be that projects like TOML Fortran don't have to implement their own, but you could simply load a TOML document into a stdlib map type and use it with any other library which supports stdlib's maps. Designed correctly a library could also load data directly into an optimized map implementation.

@FObermaier
Copy link

For the record, are you aware of fhash?

@wclodius2
Copy link
Contributor

FWIW I have just released a proposed hash map API on the website with my proposed hash function API. The API defines three modules: stdlib_32_bit_key_data_wrapper, stdlib_chaining_hash_map and stdlib_open_hash_map.

The stdlib_32_bit_key_data_wrapper module does the following: defines an abstract interface for hash functions, provides a wrapper for some hash functions so they match that interface; defines a key datatype, key_type whose contents are currently not private, but can be made abstract with setters and getters that take INT8 vectors and character strings as arguments; defines another datatype, other_type, for data that supplements the key, whose contents are currently not private, but can be made abstract with setters and getters that take INT8 vectors and character strings as arguments.

The module stdlib_chaining_hash_map implements a datatype, chaining_hash_map_type, for a simple separate chaining hash map. The API provides most of the functionality @awvwgk requests except it doesn't provide a procedure for returning a list of all keys, but that procedure is easy to implement.

The module stdlib_open_hash_map implements a datatype, open_hash_map_type, for a simple open addressing hash map with linear insertion. The API provides most of the functionality @awvwgk requests except it doesn't provide for returning a list of all keys, but that is easy to implement, and it doesn't provide a procedure for deleting elements. It is possible to provide an element deletion procedure for an open addressing hash map, but it is tricky to implement and the runtime cost is significantly higher than for a chaining hash map.

The API's for the two data types are very similar. The init_map procedure for open_hash_map_type has an additional load_factor argument missing from the chaining_hash_map_type. The open_hash_map_type also has an additional load_factor inquiry function. The chaining_hash_map_type also has the remove_entry subroutine that open_hash_map_type lacks.

The two APIs also have some inquiry functions on the structure and history of the hash maps that could be eliminated. In particular I think the total_depth function is less useful to casual users.

@wclodius2
Copy link
Contributor

FWIW I have just changed the implementation of key_type and other_type to be opaque with getters and setters and have updated hash_maps.md accordingly.

@LKedward
Copy link
Member

LKedward commented Oct 6, 2021

For the record, are you aware of fhash?

There's also my own implementation of the same name: LKedward/fhash.
(My focus for that project was mostly on providing a nice simple API without preprocessing - the underlying implementation is nothing special.)

@awvwgk
Copy link
Member Author

awvwgk commented Oct 6, 2021

Would it make sense to introduce an abstract base class which both implementation inherit from? Except for the delete feature which is absent in the open addressing implementation the APIs seem pretty much similar.

@wclodius2
Copy link
Contributor

@FObermaier the flash you refer to is based on the gcc hash table and, as a result, is probably covered by the GNU LGPL license, which I believe is not compatible with the MIT license used by the Fortran Standard Library.

@wclodius2
Copy link
Contributor

@awvwgk I have tried to keep the APIs for the hash tables very similar. The init procedures differ, but that is because I provide additional features through optional arguments that most users would not use. I should probably reduce it to four arguments: map specifying the hash table to be initialized, hasher specifying the hash function to be used by the table, and the optional slots_bits specifying the initial size of the table, and status for reporting errors with the initialization. This drops the max_bits and load_factor arguments. If load_factor is a constant then I probably don't need the load_factor inquiry function for the open_hash_map_type. I also have figured out how to implement the remove_entry procedure for the open_hash_map_type and plan to implement it. With those changes the APIs would be identical, and giving them a common base class would be straight forward, but tedious.

@FObermaier
Copy link

the fhash you refer to is based on the gcc hash table and, as a result, is probably covered by the GNU LGPL license, which I believe is not compatible with the MIT license used by the Fortran Standard Library.

No, it is not based on it, it is an implemention of the GCC hashmap structure in Fortran. Not a port of the GCC code. I think that is fine and it is MIT licensed, too.

@LecrisUT
Copy link

Any news on this?

@jvdp1
Copy link
Member

jvdp1 commented Jun 15, 2023

@LecrisUT Hash maps have been implemented in stdlib. See here for more detail. Is it what you are looking for?

@LecrisUT
Copy link

The documentation is immense and hard to navigate. Is there a rosetta stone of how to do the python equivalent of:

dict = {}
dict.__contains__("key")
dict["key"] = 2
dict.pop("key")

@jvdp1
Copy link
Member

jvdp1 commented Jun 15, 2023

@LecrisUT I agree with you that the specs are quite difficult to follow and is not written for users. Unfortunately, I never found the time to write a tutorial.
Anyway, I just wrote a small example for you (see below). You can easily compile it with fpm (I used fpm v0.9.0 with the dependency stdlib="*"). Is such an example useful for you?

program main
 use stdlib_kinds, only: int8
 use stdlib_hashmap_wrappers, only: fnv_1_hasher, key_type, other_type, set, get
 use stdlib_hashmaps, only: chaining_hashmap_type, default_bits
 implicit none
 integer, parameter :: nentries = 50

 type dummy_type
  integer :: i
  real :: myval(4)
 end type dummy_type

 integer :: i
 logical :: conflict, exists
 type(dummy_type) :: dummy

 type(key_type) :: key
 type(other_type) :: other
 type(chaining_hashmap_type) :: map
 class(*), allocatable :: data

 call map%init(fnv_1_hasher)

 !Storing data
 do i = 1, nentries
  dummy%i = i
  call random_number(dummy%myval)
  call set(key, [transfer("dummy", 1_int8, len("dummy")), transfer(i, 1_int8)])
  call set(other, dummy)
  call map%map_entry(key, other, conflict)
  if(conflict) error stop "Unable to map entry because of a key conflict"
 enddo

 write(*, '(a)')'Querying table info...'
 write(*, '(a,t40,i0)')'Number of buckets allocated: ',map%num_slots()
 write(*, '(a,t40,i0)')'Number of key-value pairs stored: ',map%entries()
 write(*, '(a,t40,i0)')'The worst case bucket depth is ',map%total_depth()


 !Retrieving data
 do i = 1, nentries
  call set(key, [transfer("dummy", 1_int8, len("dummy")), transfer(i, 1_int8)])
  call map%get_other_data(key, other, exists)
  if(.not.exists)write(*,'(a)')'Warning: missing key other'

  call get(other, data)

  select type (data)
  type is (dummy_type)
   print *, 'Other data % i = ', data%i
   print *, 'Other data % myval = ', data%myval
  class default
   print *, 'Invalid data type in other'
  end select
 enddo

end program main

@LecrisUT
Copy link

So to get this straight:

  • other_type = value_type
  • you have to manually create values for the objects key and other using the set() function so that they can be used by the dictionary
    • what's the [transfer ...] doing?
    • why not at have set as a method of key_type/other_type?
  • map_entry == operator[]
  • and then just reverse the operation for get

That is quite a lot of boilerplate, but at least it manages to get me started. Will there be a simpler interface some day?

@jvdp1
Copy link
Member

jvdp1 commented Jun 15, 2023

I think you got the idea.

So to get this straight:

  • other_type = value_type
  • you have to manually create values for the objects key and other using the set() function so that they can be used by the dictionary
    • what's the [transfer ...] doing?

set only supports int8 arrays or character. Therefore, transfer is used to copy the bitwise representation of the original key to the int8 array.
The user can write her/his own code for her/his specific variable.

Note that in my example, my key is composed of both character and integer.

  • why not at have set as a method of key_type/other_type?

Actually the set is not required. For example, the following lines could be replaced by:

 call set(key, [transfer("dummy", 1_int8, len("dummy")), transfer(i, 1_int8)])
 call set(other, dummy)
 call map%map_entry(key, other, conflict)

by (with key aliasing key_type and other aliasing other_type):

  call map%map_entry(key([transfer("dummy", 1_int8, len("dummy")), transfer(i, 1_int8)]), other(dummy), conflict)
  • map_entry == operator[]
  • and then just reverse the operation for get

That is quite a lot of boilerplate, but at least it manages to get me started. Will there be a simpler interface some day?

I guess it is. In my opinion, it gives a lot of (too much) flexibility to the users. I don't think there are plans to simplify the interfaces. However, wrappers could be added, and we are open to feedbacks from users to improve it.

@LecrisUT
Copy link

Well, one thing I couldn't find is how to store and retrieve pointers instead of values

@jvdp1
Copy link
Member

jvdp1 commented Jun 15, 2023

Well, one thing I couldn't find is how to store and retrieve pointers instead of values

Do you mean something like that:

 input = 'aaaarrrr'

 output => input

 call map%map_entry(key_type([transfer("pointer", 1_int8, len("pointer"))]), other_type(output), conflict)

 call map%get_other_data(key_type([transfer("pointer", 1_int8, len("pointer"))]), other, exists)

 call get(other, data)

 select type (data)
 type is (character(*))
  print *, 'character = ', data
 class default
  print *, 'Invalid data type in other'
 end select

@LecrisUT
Copy link

I think it is when using non-intrisic data types so that it can only use class(*), pointer which cannot upcast to class(my_type), pointer and then it complains that it can't find an implementation for get(). Probably I can just overload get() to dynamically cast the pointer.

Fortran language really needs templating to cut down all of the boilerplate.

@jvdp1
Copy link
Member

jvdp1 commented Jun 16, 2023

I think it is when using non-intrisic data types so that it can only use class(*), pointer which cannot upcast to class(my_type), pointer and then it complains that it can't find an implementation for get(). Probably I can just overload get() to dynamically cast the pointer.

I think overloading it is indeed the right way.

Fortran language really needs templating to cut down all of the boilerplate.

I agree with you.

@LecrisUT
Copy link

Took me a bit of digging, but I found an issue with the previous example:

! This one only sets transfers "d"
call set(key, [transfer("dummy", 1_int8)])
! Should be
call set(key, [transfer("dummy", 1_int8, len("dummy")])

@jvdp1
Copy link
Member

jvdp1 commented Jun 30, 2023

@LecrisUT Good catch! Thank you. I'll edit my posts. Sorry for the troubles!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea Proposition of an idea and opening an issue to discuss it topic: container (Abstract) data structures and containers
Projects
None yet
Development

No branches or pull requests

8 participants