-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API #80
Comments
👍 If this results in a significant code reorganization it might be prudent to also tackle the cross-platform/building issue at the same time. I've written an example barebones template on #49 that I think will work, but if not just let me know and I'll work on some alternatives. |
@ethanresnick thanks for getting this started. And I agree with the majority of your points. Here are my opinions. The library was initially broken down by field because, from a usability standpoint, it seemed more straightforward. You would never say "oh, my app needs matrices and sets." You'd say "I need statistics and certain calculus methods." With that, let's consider this restructure from two standpoints
Internal APII agree with the building around mathematic structures. The ones that appear to be most prevalent
These are some of the core building blocks of mathematics, and it makes sense for us to be doing so as well. Internal methods (e.g. any and all methods that exist on or between a single structure) can remain in that file as well. For example Developer APIWe're building for the non Math PhD developer. The concepts should be easily accessible and easy to grasp, and most important, super easy to use. @davidbyrd11 captured this well: "everyone should feel like they need numbers." In regards to marketing this, it's important this remains at the center of our development, especially if we want a large majority in use. A developer should be able to do a few things
Additional Notes
|
Definitely agree with Ethan and Steve! Same as above, organizing functions by their mathematical origins is definitely a better idea. This ought to lower any confusions during categorizing functions. W.R.T non object functions, I think it would be best to call it
I definitely agree with Ethan's comment. For this library developers, it makes a lot of sense to organize it my mathematical origins but to a user, this might get confusing assuming they don't really care about how it's done and just want to make stuff out of it. I mean, you don't wanna get to a stage where we have algebraic solvers but no one can find them because they are in On that same note, I think you guys should definitely enable the repo wiki so that even if we don't have 'proper' documentation, we can at least make a giant list of all the functions someone can call. We should get to having a documentation part soon too. Maybe find one of those automatic doc generators? We should also use that wiki to do a roadmap or at least standardize those object classes related design decisions. And finally pointed out by Steve, the purpose of this library is to be usable by an average person not a Ph.D. This person may or may not have enough mathematical knowledge to interpret, appreciate or utilize everything this library offers. From their end, it should be like: "So I just copy paste this line from the examples and mash these together to get this done and in only one line, cool!" (I guess less exaggerated). If we stick to organizing functions by their origins, we should definitely consider aliases few things to either make an average user's life easier or just show off what this library offers. So as an example, having Our goal from a users perspective should definitely be the 3 things Steve pointed out! Those are just my thoughts and opinions. I did exaggerate most cases here but only to get across what I wanted to say. Feel free to correct me where I went wrong. PS. Ethan, thanks for including me! :) Edit: Just looked at JSDocs for this repo. I think that takes care of everything :) |
Hey all, this issue is great, we should keep working on fine tuning this and set it for the version 1.0.0 milestone Ethan
Steve I agree that Square Matrix shouldn't be a class but we should add a Kartik
I don't feel like the where of functions matter as much. If they do we can create wrappers in locations that are more obvious to the developer but we should leave the libraries structure to be primarily based on the internal aspects of the library. Also note that we currently have these comments above each function which help auto generate jsdocs, Steve is working out a solution of where this really should move to. My Additions I think we should try to figure out where our priorities are in the library. Namely how we want the developers to work with their data. Should we be flexible and allow for a lot of different programming paradigms? Should we be biased to performance? Should we be biased to usability? Some prime examples of this are: Kartik mentioned Another prime example I think are the way we currently work with arrays, vs other alternatives. For example, we could write this data manipulation with a map reduce methodology.
This is not maximizing for performance as in several browsers a for loop will be more performant, but it does help define the way in which we work with the data in statistics. I think that working in this way will help a lot of non-javascript developers get their feet wet with the library, it also is a way of writing javascript that a lot of developers already have grokked so it wouldn't be that bit of a jump from jQuery to this library. It also makes for easier code maintainability (now we only need to check the individual values rather than if the object is an array). If we write for performance though, it might be really negligible and determining which runtime will be the benchmark (likely Node and v8) we use to optimize might also be important. |
First Set Commit as an example of most changes currently proposed in numbers#80.
Jumping back into this. Hope you all had nice holidays! I agree with most of the above and will create an API Principles page on the wiki summarizing these issues. Specific responses: Math Fields vs. Data Structures DocumentationI think @KartikTalwar's idea of keeping the examples by mathematical field, in combination with offering a view into the docs by mathematical field, hits the right balance between usability and the internal structure. Core/Util functionsI think there are really two types of code going on here. The first is the collection of functions that operate on integers. These seem like they should be in a static class because having to instantiate an The second collection of functions, which could be called @milroc's Factory functionsI'm not sold on TypesIt seems like we're going with extending the built-in types, which is a big win for interoperability. Thinking more about the performance costs of lost encapsulation though, I wonder if the answer long-term will be to have two modes: a default one in which the user can transform the data however they want but that's a little slower (because there can't be any internal caching) and a mode the user can opt-in to in which they only transform the data through our API (or, if they must modify it directly, call I've also been thinking more about type flexibility and the best way to achieve it. For example: a Overall StructureGiven the above, here's the overall structure I see for the codebase;
Misc.I also like the idea of aliases, though I'd add a couple qualifications. First, to the extent we have a "canonical name" separate from the aliases, that name should be the one used most often by our users—even if it's not the "proper" name—because none of this code is truly internal (other must contribute to it and may just take bits of it). Second, I'd cap the number of names for something in the codebase at 3, while trying to keep it at no more than 2; otherwise, things'll get too messy. As for validation, I agree with @milroc that it needs to be centralized (at least within each "class"), but beyond that I'm not sold on a specific solution. We can discuss this in another issue though, as it's not related to the API. Ditto handling function objects. Other final thoughts? |
Something I thought recently, that might be a little more important, is analyzing what it would cost to use node.js only systems (node-fibers or something else). I don't know where we stack up performance wise to other numerical/stat libraries. It might be worth adding an issue for someone to find out later in development. Another thought is to consider porting C++ libraries into numbers.js for node.js (not with llvm.js or emscripten, thus also meaning that we can't support browsers). This needs to be on rare transactions that aren't used frequently. The cost to cross node into native code is rather high. Both of these mean that we wouldn't have browser support from my understanding. I also am not sure about the performance benefits. I want to read what you wrote more in depth before I comment but initially it looks good. |
Ethan, I read a little deeper and only two things really seemed worth mentioning to me:
Also note that:
|
Miles, On reflection, I totally agree with your point 1. Let's put the single integer functions and the core stuff directly on Also, as I've started to take a stab at implementing Re NaN, I can definitely see a place for it, but I also think it's too silent/slippery to replace throwing errors in most cases. |
So if we could collapse this to one definitive "API v0.2.0" I'd be happy to try to convert everything to that by the end of this week or next. Or someone else could if they'd like to. |
I'll work on the definitive doc. Your help with the code conversion is much appreciated! |
This doc is now in progress on the wiki. Also, I'm going to start putting some of these things we've discussed into action on the abstract-structures branch. |
I've been thinking a lot lately about what the API should look like, which seems important to nail down before we add too much more code (things'll just get harder to change and the API's key for marketing/usability). Lets use this issue to try to figure out the overarching principles.
Right now we have two conflicting modes of operation going on:
If we're moving in the object direction, I think we need to figure out how we're going to keep that feeling light and usable—cause it can get clunky really fast.
A couple principles came up from the last discussion:
Having factory methods on
numbers
that make it easy to get from built-ins to our custom objects at the start of the chain. I.e.:numbers.createMatrix(arr).matrixMethod()
.Instead of the
createX
naming, we could also donewX
ormakeX
, but, please, lets not end up withnumbers.matrix(arr)
—there should be something in the method name that implies that an object is being created.All object methods that take another custom object as an argument (e.g. Vector getting the distance between another Vector) should also take a built-in representation of that other object wherever possible. @milroc suggested this and I totally agree.
(One thing to consider here is whether passing in a native structure rather than one of our objects should ever cause the return type to also be a built-in rather than one of our custom objects. Though this may be a moot point if we have our objects extend from the built-ins...see below.)
We should also shorten some of our method names in general (along the lines of what @revivek did in Matrix #57)
Combining the first, then, calculating the distance between two vectors would look like:
numbers.createVector([0,1,3,4]).distanceFrom([3,5,6,7])
,as opposed to the current:
Then I've also been thinking about a couple other principles.
Maybe our data types could extend the built-in objects. So Matrix would extend
Array
, for instance, and you could do things likenumbers.createMatrix(arr).transpose()[0,0]
, because theMatrix
returned by.transpose()
would also be anArray
.* This could be super convenient. Most significantly, it would let our data structures interact with other libraries that expect native arrays.The downside is that it requires giving up some encapsulation. For instance, the idea of caching the
length
property inrowCount
goes away because the data can now be updated without the object knowing about it. Similarly, if we had aSet
object you could imagine caching the mean/average in an instance property to speed up a lot of calculations, but allowing direct access to this data would make it impossible to automatically know when to invalidate this cache. (Note that in both of these examples the data could really have been updated directly anyway, but at least in the non-array approach you'd have had to go through.data
, which could easily be documented as internal or even renamed.__data
. )One option would be to say in the docs that the underlying
Array
methods should only be used if they don't transform the underlying data. That would still leave some utility in the native Array interface (e.g. direct access to an element like in the transpose example, but also access to a row or set of rows with.slice
, the ability to loop over rows in implementations that support.forEach
, etc).Another option would be to create a naming convention for a method that recalculates any internal properties, i.e. we could say "go crazy with the native array interface, setting things, deleting them, adding them, whatever, and then just call
.update()
or whatever to reset the key stuff in the internal state". Having to call anupdate()
seems a little much though.Btw, another really cool application of extending the built-ins: applying it to
function
s too. So we could have something like:If we decide not to have our objects extend the built-in data types, then we should create a consistently-named method on every object for getting from that object to a representation of it using a built-in structure. Maybe something like
toBuiltIn
. That way the user can call this method at the end of the chain before handing their data off to the next part of their application.For convenience, I think we should allow subclass methods to be called on super-class objects where applicable. For instance, if someone has created a Matrix object and tries to call
determinant
, that should transparently forward the call to thedeterminant
onSquareMatrix
(if the Matrix is square).I don't like the idea of the superclass definition knowing about its subclasses—the coupling seems way too tight—but we should be able to avoid that if we just keep all the code for adding the forwarding (which would modify
Superclass.prototype
directly) in a separate part of the codebase. That way, there'd still be one location with the primary superclass definition and that chunk of code could easily be transplanted into another project and operate without any dependencies on the subclass.Finally, there were two other things that were bothering me:
If there are things that we don't want to put in objects, what are they and where should those go? As I mentioned in the other issue, one example of this might be the methods that operate on single numbers, because having to create a
Number
object to house those methods seems like overkill. In the other issue, I proposed putting these in anumbers.util
"static class" that would function basically how the library does now. But is that the best option? What are the alternatives?As we think about restructuring in terms of objects, the objects that make the most sense to me seem to be those around mathematical constructs (Set, Sequence, Distribution, Function, etc), but this is very different from our current taxonomy which is based around mathematical fields (calculus, stats, linear algebra, primality, etc).
Now, on one hand, I can see this switch actually resolving some ambiguity and confusion. For instance, why are
min
andmax
on basic whereasmedian
andmode
are on stats? In a restructuring, they'd all be united underSet
.But I'm worried that having these mathematical constructs as the top-level organizational structures might make the library seem less accessible from the outside. Maybe that's just a documentation issue, though? I.e. we could tag each method with the mathematical fields it's relevant too, and then the docs would still be able to show all the stats methods or all the calc methods. The other option would be to use these structures internally but somehow expose an API structured like the current one...but I can't imagine how that would work. Does this seem like a big problem?
Overall thoughts? Additions?
Sorry for the length, but the API is arguably the most important design decision for the library's success, so it seemed worth a full discussion.
CCs: @sjkaliski, @davidbyrd11. Also @KartikTalwar, whose been contributing a lot so might want to follow these developments.
*
Extending Array in javascript is a mess, but it can be done workably by having an object constructor that just returns a native array with methods tacked onto it directly ("parasitic inheritance" in Crockford-ese). And this can even be performant if the constructor tacks on functions which are only created once in an outside scope and then simply referenced by the returned array's properties...somehow, this even seems to end up faster than standard prototypal inheritance (I guess because Chrome really optimizes Array construction). I made a test for this here.The text was updated successfully, but these errors were encountered: